Editor’s Note: As advanced algorithms continue to emerge for smart product designs, developers often find themselves struggling to implement embedded systems able to meet the associated processing demands of these algorithms. FPGAs can deliver the required performance, but designing with FPGAs has long been considered limited to the purview of FPGA programming experts. Today, however, the availability of more powerful FPGAs and more effective development environments has made FPGA development broadly accessible. In this excerpt, Chapter 4 from the book Architecting High-Performance Embedded Systems, the author offers a comprehensive review of FPGA devices, implementation languages, and the FPGA development process as well as a detailed walkthrough of how to get started implementing FPGAs in your own design. The complete excerpt is presented in the following series of installments:
1: Hardware resources (this article)
2: Implementation languages
3: Development process
4: Building a project
Adapted from Architecting High-Performance Embedded Systems, by Jim Ledin.
Developing Your First FPGA Program
This chapter begins with a discussion on the effective use of FPGA devices in real-time embedded systems and continues with a description of the functional elements contained within standard FPGAs. The range of FPGA design languages, including Hardware Description Languages (HDLs), block diagram methods, and popular software programming languages including C and C++, is introduced. The chapter continues with an overview of the FPGA development process and concludes with a complete example of an FPGA development cycle starting with a statement of system requirements and ending with a functional system implemented in a low-cost FPGA development board.
After completing this chapter, you will know how FPGAs can be applied in real-time embedded system architectures and will understand the components that make up an FPGA integrated circuit. You will have learned about the programming languages used in the design of FPGA algorithms and will understand the sequence of steps to develop an FPGA-based application. You will also have worked through a complete FPGA development example on a low-cost development board using free FPGA software tools.
We will cover the following topics in this chapter:
- Using FPGAs in real-time embedded system designs
- FPGA implementation languages
- The FPGA development process
- Developing your first FPGA project
The files for this chapter are available at https://github.com/PacktPublishing/Architecting-High-Performance-Embedded-Systems.
Using FPGAs in real-time embedded system designs
As we saw in the Elements of FPGAs section of Chapter 1, Architecting High-Performance Embedded Systems, a typical FPGA device contains a large number of lookup tables, flip-flops, block RAM elements, DSP slices, and other components. While it can be instructive to understand the detailed capabilities of each of these components, such concerns are not necessarily informative during the FPGA development process. The most important constraint to keep in mind is that a specific FPGA part number contains a finite number of each of these elements, and a design cannot exceed those limits when targeted at that particular FPGA model.
Instead, it is more productive to view the FPGA development process from the perspective of the embedded system’s statement of requirements. You can begin to develop the FPGA design targeted at a somewhat arbitrarily chosen FPGA model. As development proceeds, you may reach a resource limit or identify an FPGA feature the design requires that is not present in the currently targeted FPGA. At that point, you can select a different, more capable, target and continue development.
Alternatively, as development of the design nears completion, you may realize the target FPGA you originally selected contains excessive resources and the design could be improved by selecting a smaller FPGA, with potential benefits in terms of lower cost, fewer pins, smaller package size, and reduced power consumption.
In either of these situations, it is generally straightforward to switch the targeted FPGA to a different model within the same family. The development tools and design artifacts you have created to this point should be fully reusable with the newly targeted FPGA model. If it becomes necessary to switch to a different family of FPGAs from the same vendor, or to a model from a different vendor, the switchover will likely involve more work.
The point of this discussion is to emphasize that it is not too important to identify a specific FPGA model at the outset of a high-performance embedded system development effort. Instead, early considerations should focus on validating the decision to use an FPGA as part of the design, then, if the FPGA is the best design approach, proceed with the selection of a suitable FPGA vendor and device family.
Example projects in this book will be based on the Xilinx Vivado family of FPGA development tools. Although a Vivado license must be purchased to develop for some Xilinx FPGA families, the FPGA devices in the Artix-7 we will be working with are supported by Vivado for free. The Artix-7 FPGA family combines the attributes of high performance, low power consumption, and reduced total system cost. Similar FPGA device families and development tool suites are available from other FPGA vendors.
FPGA development is a fairly involved process, with a variety of types of analysis and design data input required. To avoid discussing these topics at too abstract a level, and to present concrete results in terms of working example projects, we will be using Vivado throughout the book. Once you are familiar with the tools and techniques discussed here, you should be able to apply them using similar tools from other vendors.
The following sections will discuss some key differentiating features of the families of FPGAs and individual models within those families, including the quantity of block RAM, the quantity and types of I/O signals available, specialized on-chip hardware resources, and the inclusion of one or more hardware processor cores in the FPGA package.
Block RAM and distributed RAM
Block RAM is used to implement regions of memory within an FPGA. A particular memory region is specified in terms of the width in bits (typically 8 or 16 bits) and the depth, which defines the number of storage locations in the memory region.
The total quantity of block RAM in an FPGA is usually specified in terms of kilobits (Kb). The amount of block RAM available varies across FPGA families and among the models within a particular family. As you would expect, larger, more expensive parts generally have a greater quantity of resources that can be used as block RAM.
In Xilinx FPGAs, and to varying degrees in FPGAs from other vendors, a distinct category of memory called distributed RAM is available in addition to block RAM. Distributed RAM is constructed from the logic elements used in lookup tables and repurposes the circuitry of those devices to form tiny segments of RAM, each containing 16 bits. These segments can be aggregated to form larger memory blocks when necessary.
Block RAM tends to be used for purposes traditionally associated with RAM, such as implementing processor cache memory or as a storage buffer for I/O data. Distributed RAM might be used for purposes such as the temporary storage of intermediate computation results. Because distributed RAM is based on lookup table circuitry, the use of distributed RAM in a design reduces the resources available for implementing logic operations.
Block RAM can have a single port or dual ports. Single-port block RAM represents the common usage pattern of a processor that reads and writes RAM during operation.
Dual-port block RAM provides two read/write ports, both of which can be actively reading or writing the same memory region simultaneously.
Dual-port block RAM is ideal for situations where data is being transferred between portions of an FPGA running at differing clock speeds. For example, an I/O subsystem might have a clock speed in the hundreds of MHz as it receives an incoming data stream. The I/O subsystem writes incoming data to the block RAM as it arrives through one of the FPGA’s high-speed I/O channels. A separate subsystem with the FPGA, running at a different clock speed, can read data from the block RAM’s second port without interfering with the operation of the I/O subsystem.
Block RAM can also operate in first-in-first-out (FIFO) mode. In the example of the incoming serial data stream, the I/O subsystem can insert data words into the FIFO as they arrive and the processing subsystem can read them out in the same order. Block RAM in FIFO mode provides signals indicating whether the FIFO is full, empty, almost full, or almost empty. The definitions of almost full and almost empty are up to the system designer. If you assign almost empty to mean less than 16 items are left in the FIFO, you can then be assured that any time the FIFO does not indicate it is almost empty, you can read 16 items without further checks of data availability.
When using block RAM in FIFO mode, it is vital that the logic inserting items into the FIFO never attempts to write when the FIFO is full, and the logic reading from the FIFO never attempts to read when the FIFO is empty. If either of these events occurs, the system will either lose data or will attempt to process undefined data.
FPGA I/O pins and associated features
Because FPGAs are intended for use in high-performance applications, their I/O pins are generally capable of implementing a variety of high-speed I/O standards. During the implementation of a design with an FPGA development tool suite, the system developer must perform tasks that include assigning functions to particular pins on the FPGA package and configuring each of those pins to operate with the appropriate interface standard. Additional steps must be performed to associate input and output signals within the FPGA model code with the correct package pins.
At the pin level, individual I/O signals are either single-ended or differential.
A single-ended signal is referenced to ground. Traditional Transistor-Transistor Logic (TTL) and Complementary Metal Oxide Semiconductor (CMOS) digital signals operate over a range of 0-5 VDC relative to ground.
Modern FPGAs typically do not support the legacy 5 VDC signal range, but instead support TTL and CMOS signals operating over a reduced voltage range, thereby reducing power consumption and improving speed. Low Voltage TTL (LVTTL) signals operate over a range of 0-3.3VDC. Low Voltage CMOS (LVCMOS) signals are selectable with signaling voltages of 1.2, 1.5, 1.8, 2.5, and 3.3 V. These signal types are named LVCMOS12, LVCMOS15, LVCMOS18, LVCMOS25, and LVCMOS33. Other high-performance single- ended signal types are available, including High-Speed Transceiver Logic (HSTL) and Stub-Series Terminated Logic (SSTL).
Single-ended signals are widely used for low-frequency purposes, such as reading pushbutton inputs and lighting LEDs. Single-ended signals are also used in many lower-speed communication protocols such as I2C and SPI. An important drawback of single-ended signals is that any noise coupled into the wires and printed circuit board traces carrying the signal has the potential to corrupt the input to the receiver. This problem can be substantially reduced through the use of differential signaling.
For the highest data transfer rates, differential signaling is the preferred approach.
Differential signals use a pair of I/O pins and drive opposing signals onto the two pins.
In other words, one pin is driven to a higher voltage and the other pin to a lower voltage to represent a 0 data bit and the pin voltages are reversed to represent a 1 bit. The differential receiver subtracts the two signals to determine whether the data bit is 0 or 1. Because the two wires or traces carrying the differential signal are physically located very close together, any noise that couples into one of the signals will couple to the other one in a very similar manner. The subtraction operation removes the vast majority of the noise, enabling reliable operation at much higher data transfer rates than single-ended signals.
A number of differential signal standards are supported by standard FPGAs. Several differential versions of HSTL and SSTL are defined, with a variety of signaling voltage levels for each.
Low-Voltage Differential Signaling (LVDS) was introduced as a standard in 1994 and continues to be used in a variety of applications. An LVDS signaling transmitter produces a constant current of 3.5 mA and switches the direction of the current flowing through the resistor at the receiver to produce state changes representing 0 and 1 data values as shown in Figure 4.1:
Figure 4.1 – LVDS interface
In LVDS communication, as in the other differential and single-ended signaling standards, it is important for the impedance of the communication path between the transmitter and receiver to closely match the termination impedance, which is 100 Instance 8 in the case of LVDS. If the impedance of the communication channel does not match the termination impedance, reflections can occur on the line, preventing reliable data reception.
The impedance of differential signal trace pairs is a function of the geometry of the pair traces and their relationship to the ground plane. As we will see in Chapter 6, Designing Circuits with KiCad, it is straightforward to design circuit boards that satisfy the requirements of high-speed differential signaling standards.
Specialized hardware resources
FPGAs generally include a selection of dedicated hardware resources for functions that are commonly required and are either more efficiently implemented in hardware rather than using synthesized FPGA functions, or not possible to implement with FPGA components. Some examples of these resources are as follows:
- Interfaces to external dynamic RAM (DRAM) for storing large quantities of These interfaces generally support a common DRAM standard such as DDR3.
- Analog-to-digital converters.
- Phase-locked loops, used for generating multiple clock frequencies.
- Digital signal processing multiply-accumulate (MAC) hardware.
These hardware resources enable the development of complex systems with wide-ranging capabilities. Dedicated hardware is provided for functions like the MAC operation because the hardware performance is significantly better than the synthesized equivalent functionality using FPGA logic resources.
Some FPGA families include hardware processor cores for the purpose of combining peak software execution speed with the performance advantages of FPGA-implemented algorithms. For example, the Xilinx Zynq-7000 family integrates a hardware ARM Cortex-A9 processor together with a traditional FPGA fabric.
FPGA designs that do not require a hardware processor can implement a processor using the FPGA resources, referred to as a soft processor. Soft processors are highly configurable, though they are generally not capable of matching the performance of a processor implemented in hardware.
The next section will introduce the primary programming languages and data entry methods used to develop FPGA algorithms.
Reprinted with permission from Packt Publishing. Copyright © 2021 Packt Publishing
|Jim Ledin is the CEO of Ledin Engineering, Inc. Jim is an expert in embedded software and hardware design, development, and testing. He is also accomplished in embedded system cybersecurity assessment and penetration testing. He has a B.S. degree in aerospace engineering from Iowa State University and an M.S. degree in electrical and computer engineering from Georgia Institute of Technology. Jim is a registered professional electrical engineer in California, a Certified Information System Security Professional (CISSP), a Certified Ethical Hacker (CEH), and a Certified Penetration Tester (CPT).
For more Embedded, subscribe to Embedded’s weekly email newsletter.
The post Embedded design with FPGAs: Hardware resources appeared first on Embedded.com.