August 05, 2009
By Peter Alfke, Xilinx Inc.
What is the purpose of this paper?
This paper gives potential users an easy-to-grasp idea of the device functions of Xilinx Spartan-6 FPGAs. It describes the functionality of these devices in far more detail than in the data sheet—but avoids the minute implementation details covered in the various Spartan-6 FPGA User Guides. In traditional product documentation, a data sheet provides concentrated information about the whole family, without describing the capabilities in great detail. On the other hand, user guides give all the details that the designer needs, but— at more than a thousand pages— they may require weeks of work to read and understand all the details.
This paper describes the capabilities (what you can do) in detail but leaves out the implementation details (how to utilize the capabilities). The idea is to give the designer enough information to evaluate the capabilities, without requiring weeks of study. This paper should create significant enthusiasm in many designers, who before did not have the patience or the motivation to study entire user guides.
General Description
Spartan-6 FPGA Data Sheet: DC and Switching Characteristics
The Spartan-6 FPGA architecture builds on the success of the Virtex-5 family, but emphasizes lower cost and lower power consumption, using a mature 45-nm process.
The eleven Spartan-6 devices cover a wide range, from 3,800 to 147,000 Logic Cells, with up to 184,000 flip-flops and up to 4.8 Mb of internal block RAM.
Compared to custom ASIC solutions, Spartan-6 devices offer a flexible, low-risk alternative for high-volume logic designs, consumer-oriented DSP applications, and cost-sensitive embedded solutions.
CLBs, Slices, and LUTs
Spartan-6 FPGA Configurable Logic Block User Guide
Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side as part of two vertical columns. There are three types of CLB slices in the Spartan-6 architecture: SLICEM, SLICEL, and SLICEX. Each slice contains four LUTs, eight flip-flops, and miscellaneous logic. The LUTs are for general-purpose combinatorial and sequential logic support. Synthesis tools take advantage of these highly efficient logic, arithmetic, and memory features. Expert designers can also instantiate them.
SLICEM
One quarter (25%) of Spartan-6 slices are SLICEMs. Each of the four SLICEM LUTs can be configured as either a 6 input LUT with one output, or as dual 5-input LUTs with identical 5-bit addresses and two independent outputs. These LUTs can also be used as distributed 64-bit RAM with 64 bits or two times 32 bits per LUT, as a single 32-bit shift register (SRL32), or as two 16-bit shift registers (SRL16s) with addressable length. Each LUT output can be registered in a flip-flop within the CLB. For arithmetic operations, a high-speed carry chain propagates carry signals upwards in a column of slices.
SLICEL
One quarter (25%) of Spartan-6 slices are SLICELs, which contain all the features of the SLICEM except the memory/shift register function.
SLICEX
One half (50%) of Spartan-6 slices are SLICEXs. The SLICEXs have the same structure as SLICELs except the arithmetic carry option and the wide multiplexers.
Clock Management
Spartan-6 FPGA Clocking Resources User Guide
Each Spartan-6 FPGA has up to six CMTs, each consisting of two DCMs and one PLL, which can be used individually or concatenated.
DCM
The DCM provides four phases of the input frequency (CLKIN): shifted 0°(176), 90°(176), 180°(176), and 270°(176) (CLK0, CLK90, CLK180, and CLK270). It also provides a doubled frequency CLK2X and its complement CLK2X180. The CLKDV output provides a fractional clock frequency that can be phase-aligned to CLK0. The fraction is programmable as every integer from 2 to 16, as well as 1.5, 2.5, 3.5 . . . 7.5. CLKIN can optionally be divided by 2. The DCM can be a zero-delay clock buffer when a clock signal drives CLKIN, while the CLK0 output is fed back to the CLKFB input.
Frequency Synthesis
Independent of the basic DCM functionality, the frequency synthesis outputs CLKFX and CLKFX180 can be programmed to generate any output frequency that is the DCM input frequency (FIN) multiplied by M and simultaneously divided by D, where M can be any integer from 2 to 32 and D can be any integer from 1 to 32.
Phase Shifting
With CLK0 connected to CLKFB, all nine CLK outputs (CLK0, CLK90, CLK180, CLK270, CLK2X, CLK2X180, CLKDV, CLKFX, and CLKFX180) can be shifted by a common amount, defined as any integer multiple of a fixed delay. A fixed DCM delay value (fraction of the input period) can be established by configuration and can also be incremented or decremented dynamically.
Spread-Spectrum Input
The DCM can accept and track typical spread-spectrum clock inputs, provided they abide by the input clock specifications listed in the Spartan-6 data sheet.
PLLs
The PLL can serve as a frequency synthesizer for a wider range of frequencies and as a jitter filter for incoming clocks in conjunction with the DCMs. The heart of the PLL is a voltage-controlled oscillator (VCO) with a frequency range of 400 MHz to 1000 MHz, thus spanning more than one octave. Three sets of programmable frequency dividers (D, M, and O) adapt the VCO to the required application.
The pre-divider D (programmable by configuration) reduces the input frequency and feeds one input of the traditional PLL phase comparator. The feedback divider (programmable by configuration) acts as a multiplier because it divides the VCO output frequency before feeding the other input of the phase comparator. D and M must be chosen appropriately to keep the VCO within its controllable frequency range.
The VCO has eight equally spaced outputs (0°(176), 45°(176), 90°(176), 135°(176), 180°(176), 225°(176), 270°(176), and 315°(176)). Each can be selected to drive one of the six output dividers, O0 to O5 (each programmable by configuration to divide by any integer from 1 to 128).
Clock Distribution
Each Spartan-6 FPGA provides abundant clock lines to address the different clocking requirements of high fanout, short propagation delay, and extremely low skew.
Global Clock Lines
In each Spartan-6 FPGA, 16 global-clock lines have the highest fanout and can reach every flip-flop clock. Global clock lines must be driven by global clock buffers, which can also perform glitchless clock multiplexing and the clock enable function. Global clocks are often driven from the CMTs, which can completely eliminate the basic clock distribution delay.
I/O Clocks
I/O clocks are especially fast and serve only the localized input and output delay circuits and the I/O serializer/deserializer (SERDES) circuits, as described in the I/O Logic section.
Block RAM
Spartan-6 FPGA Block RAM Resources User Guide
Every Spartan-6 FPGA has between 12 and 268 dual-port block RAMs, each storing 18 Kb. Each block RAM has two completely independent ports that share only the stored data.
Synchronous Operation
Each memory access, whether read or write, is controlled by the clock. All inputs, data, address, clock enables, and write enables are registered. The data output is always latched, retaining data until the next operation. An optional output data pipeline register allows higher clock rates at the cost of an extra cycle of latency.
During a write operation in dual-port mode, the data output can reflect either the previously stored data, the newly written data, or remain unchanged.
Programmable Data Width
- Each port can be configured as 16K - 1, 8K - 2, 4K - 4, 2K - 9 (or 8), 1K - 18 (or 16), or 512 x 36 (or 32).
- The x9, x18, and x36 configurations include parity bits. The two ports can have different aspect ratios.
- Each block RAM can be divided into two completely independent 9 Kb block RAMs that can each be configured to any aspect ratio from 8K x 1 to 256 x 36.
- In 9 Kb block RAMs, only simple dual-port mode can provide data widths of >18 bits. In this mode, one port is dedicated to read operation and the other port is dedicated to write operation. The full freedom to mix port data width values is retained, but there is no read output during write. The 18 Kb RAM has no dual-port data width limitation.
Memory Controller Block
Spartan-6 FPGA Memory Controller User Guide
Most Spartan-6 devices include dedicated memory controller blocks (MCBs), each targeting a single-chip DRAM (either DDR, DDR2, DDR3, or LPDDR), and supporting access rates of up to 800 Mb/s.
The MCB has dedicated routing to predefined FPGA I/Os. If the MCB is not used, these I/Os are available as general purpose FPGA I/Os. The memory controller offers a complete multi-port arbitrated interface to the logic inside the Spartan-6 FPGA. Commands can be pushed, and data can be pushed to and pulled from independent built-in FIFOs, using conventional FIFO control signals. The multi-port memory controller can be configured in many ways. An internal 32-, 64-, or 128-bit data interface provides a simple and reliable interface to the MCB.
The MCB can be connected to 4-, 8-, or 16-bit external DRAM. The MCB, in many applications, provides a faster DRAM interface compared to traditional internal data buses, which are wider and are clocked at a lower frequency. The FPGA logic interface can be flexibly configured irrespective of the physical memory device.
Digital Signal Processing—DSP48A1 Slice
Spartan-6 FPGA DSP48A1 Slice User Guide
DSP applications use many binary multipliers and accumulators, best implemented in dedicated DSP slices. All Spartan-6 FPGAs have many dedicated, full-custom, low-power DSP slices, combining high speed with small size, while retaining system design flexibility.
Each DSP48A1 slice consists of a dedicated 18 - 18 bit two's complement multiplier and a 48-bit accumulator, both capable of operating at 250 MHz. The DSP48A1 slice provides extensive pipelining and extension capabilities that enhance speed and efficiency of many applications, even beyond digital signal processing, such as wide dynamic bus shifters, memory address generators, wide bus multiplexers, and memory-mapped I/O register files. The accumulator can also be used as a synchronous up/down counter. The multiplier can perform barrel shifting.
Input/Output
Spartan-6 FPGA SelectIO Resources User Guide
The number of I/O pins varies from 102 to 576, depending on device and package size. Each I/O pin is configurable and can comply with a large number of standards, using up to 3.3V. The Spartan-6 FPGA SelectIO Resources User Guide describes the I/O compatibilities of the various I/O options. With the exception of supply pins and a few dedicated configuration pins, all other package pins have the same I/O capabilities, constrained only by certain banking rules. All user I/O is bidirectional; there are no input only pins.
All I/O pins are organized in banks, with four banks on the smaller devices and six banks on the larger devices. Each bank has several common VCCO output supply-voltage pins, which also powers certain input buffers. Some single-ended input buffers require an externally applied reference voltage (VREF). There are several dual-purpose VREF-I/O pins in each bank. In a given bank, when I/O standard calls for a VREF voltage, each VREF pin in that bank must be connected to the same voltage rail and can not be used as an I/O pin.
I/O Electrical Characteristics
Single-ended outputs use a conventional CMOS push/pull output structure, driving High towards VCCO or Low towards ground, and can be put into high-Z state. Many I/O features available to the system designer to optionally invoke each I/O in their design, such as weak internal pull-up and pull-down resistors, strong internal split-termination input resistors, adjustable output drive-strengths and slew-rates, and differential termination resistors. See the Spartan-6 FPGA SelectIO Resources User Guide for more details available options for each I/O standard.
I/O Logic
Spartan-6 FPGA SelectIO Resources User Guide
Input and Output Delay
This section describes the available logic resources connected to the I/O interfaces. All inputs and outputs can be configured as either combinatorial or registered. Double data rate (DDR) is supported by all inputs and outputs. Any input or output can be individually delayed by up to 256 increments of ≈100 ps each. This is implemented as IODELAY2. The identical delay value is available either for data input or output. For a bidirectional data line, the transfer from input to output delay is automatic. The number of delay steps can be set by configuration and can also be incremented or decremented while in use.
Because these tap delays vary with supply voltage, process, and temperature, an optional calibration mechanism is built into each IODELAY2:
- In the simple system synchronous case, a data input delay value that guarantees zero data hold time is inserted automatically, without user intervention.
- For source synchronous designs where more accuracy is required, the calibration mechanism can (optionally) determine dynamically how many taps are needed to delay data by one full I/O clock cycle, and then programs the IODELAY2 with 50% of that value, thus centering the I/O clock in the middle of the data eye.
- A special mode is available only for differential inputs, which uses a phase detector mechanism to determine whether the incoming data signal is being accurately sampled in the middle of the eye. The results from the phase detector logic can be used to either increment or decrement the input delay, one tap at a time, to ensure error free operation at very high bit rates.
ISERDES and OSERDES
Many applications combine high-speed bit-serial I/O with slower parallel operation inside the device. This requires a serializer and deserializer (SerDes) inside the I/O structure. Each input has access to its own deserializer (serial-to-parallel converter) with programmable parallel width of 2, 3, or 4 bits. Where differential inputs are used, the two serializers can be cascaded to provide parallel widths of 5, 6, 7, or 8 bits. Each output has access to its own serializer (parallel-to-serial converter) with programmable parallel width of 2, 3, or 4 bits. Two serializers can be cascaded when a differential driver is used to give access to bus widths of 5, 6, 7, or 8 bits.
When distributing a double data rate clock, all SerDes data is actually clocked in/out at single data rate to eliminate the possibility of bit errors due to duty cycle distortion. This faster single data rate clock is either derived via frequency multiplication in a PLL, or doubled locally in each IOB by differentiating both clock edges when the incoming clock uses double data rate.
Low-Power Gigabit Transceiver
Spartan-6 FPGA GTP Transceivers User Guide
Ultra-fast data transmission between ICs, over the backplane, or over longer distances is becoming increasingly popular and important. It requires specialized dedicated on-chip circuitry and differential I/O capable of coping with the signal integrity issues at these high data rates.
All Spartan-6 LXT devices have 2"8 gigabit transceiver circuits. Each GTP transceiver is a combined transmitter and receiver capable of operating at a data rate between 622 Mb/s and 3.125 Gb/s. The transmitter and receiver are independent circuits that use separate PLLs to multiply the reference frequency input by certain programmable numbers between 2 and 25, to become the bit-serial data clock. Each GTP transceiver has a large number of user-definable features and parameters. All of these can be defined during device configuration, and many can also be modified during operation.
Transmitter
The transmitter is fundamentally a parallel-to-serial converter with a conversion ratio of 8, 10, 16, or 20. The transmitter output drives the PC board with a single-channel differential current-mode logic (CML) output signal.
TXOUTCLK is the appropriately divided serial data clock and can be used directly to register the parallel data coming from the internal logic. The incoming parallel data is fed through a small FIFO and can optionally be modified with the 8B/10B algorithm to guarantee a sufficient number of transitions. The bit-serial output signal drives two package pins with complementary CML signals. This output signal pair has programmable signal swing as well as programmable pre-emphasis to compensate for PC board losses and other interconnect characteristics.
Receiver
The receiver is fundamentally a serial-to-parallel converter, changing the incoming bit serial differential signal into a parallel stream of words, each 8, 10, 16, or 20 bits wide. The receiver takes the incoming differential data stream, feeds it through a programmable equalizer (to compensate for the PC board and other interconnect characteristics), and uses the FREF input to initiate clock recognition. There is no need for a separate clock line. The data pattern uses non-return-to-zero (NRZ) encoding and optionally guarantees sufficient data transitions by using the 8B/10B encoding scheme. Parallel data is then transferred into the FPGA logic using the RXUSRCLK clock. The serial-to-parallel conversion ratio can be 8, 10, 16, or 20.
Integrated Endpoint Blocks for PCI Express Designs
The PCI Express standard is a packet-based, point-to-point serial interface standard. The differential signal transmission uses an embedded clock, which eliminates the clock-to-data skew problems of traditional wide parallel buses.
The PCI Express Base Specification 1.1 defines bit rate of 2.5 Gb/s per lane, per direction (transmit and receive). When using 8B/10B encoding, this supports a data rate of 2.0 Gb/s per lane.
The Spartan-6 LXT devices include one integrated Endpoint block for PCI Express technology that is compliant with the PCI Express Base Specification Revision 1.1. This block is highly configurable to system design requirements and operates as a compliant single lane Endpoint. The integrated Endpoint block interfaces to the GTP transceivers for serialization/de-serialization, and to block RAMs for data buffering. Combined, these elements implement the physical layer, data link layer, and transaction layer of the protocol.
Xilinx provides a light-weight (<100 LUT), configurable, ease-of-use LogiCORE wrapper that ties the various building blocks (the integrated Endpoint block for PCI Express technology, the GTP transceivers, block RAM, and clocking resources) into a compliant Endpoint solution. The system designer has control over many configurable parameters: maximum payload size, reference clock frequency, and base address register decoding and filtering.
Configuration
Spartan-6 FPGA Configuration User Guide
Spartan-6 FPGAs store the customized configuration data in SRAM-type internal latches. The number of configuration bits is between 2.6 Mb and 33 Mb depending on device size but independent of the specific user-design implementation, unless compression mode is used. The configuration storage is volatile and must be reloaded whenever the FPGA is powered up. This storage can also be reloaded at any time by pulling the PROGRAM_B pin Low. Several methods and data formats for loading configuration are available.
Bit-serial configurations can be either master serial mode, where the FPGA generates the configuration clock (CCLK) signal, or slave serial mode, where the external configuration data source also clocks the FPGA. For byte-wide configurations, master SelectMAP mode generates the CCLK signal while slave SelectMAP mode receives the CCLK signal for the 8 and 16-bit-wide transfer. In master serial mode, the beginning of the bitstream can optionally switch the clocking source to an external clock, which can be faster or more precise than the internal clock. The available JTAG pins use boundary-scan protocols to load bit-serial configuration data.
The bitstream configuration information is generated by the ISE software using a program called BitGen. The configuration process typically executes the following sequence:
- Detects power-up (power-on reset) or PROGRAM_B when Low.
- Clears the whole configuration memory.
- Samples the mode pins to determine the configuration mode: master or slave, bit-serial or parallel.
- Loads the configuration data starting with the bus-width detection pattern followed by a synchronization word, checks for the proper device code, and ends with a cyclic redundancy check (CRC) of the complete bitstream.
- Starts a user-defined sequence of events: releasing the internal reset (or preset) of flip-flops, optionally waiting for the DCMs and/or PLLs to lock, activating the output drivers, and transitioning the DONE pin to High.
Spartan-6 FPGAs support MultiBoot configuration, where two or more FPGA configuration bitstreams can be stored by a single configuration source. The FPGA application controls which configuration to load next and when to load it.
Spartan-6 FPGAs also include a unique, factory-programmed Device DNA identifier useful for tracking purposes, anti-cloning designs, or IP protection. In the largest devices, bitstreams can be copy protected using AES encryption.
About the author
Peter Alfke joined Xilinx in 1988 as director of applications engineering. He currently serves as Distinguished Engineer in the Advanced Products Group.
Peter graduated in electronic engineering from the Technical University in Hannover, Germany in 1957. He went on to work in telecom and computer design with LM Ericsson and Litton Industries before moving to California in 1968. He has spent forty years in Applications Engineering with Fairchild, Zilog, AMD, and now Xilinx. Peter holds more than 30 patents, has authored many application notes, and given worldwide seminars on digital integrated circuits. He is active in the newsgroup comp.arch.fpga.
==========
댓글 없음:
댓글 쓰기