Guide to FPGA Implementation of Algorithms

Guide to FPGA Implementation of Arithmetic Functions

Table of Contents and overview

Overview

This book is divided into sixteen chapters. In the first chapter the basic building blocks of digital systems are briefly reviewed, and their VHDL descriptions are presented. It constitutes a bridge with previous courses, or books, on Hardware Description Languages and Logic Circuits.

Chapters 2 to 4 constitute a first part whose aim is the description of the basic principles and methods of algorithm implementation. Chapter 2 describes the breaking up of a circuit into Data Path and Control Unit, and tackles the scheduling and resource assignment problems. In Chapters 3 and 4 some special topics of Data Path and Control Unit synthesis are presented.

Chapter 5 recalls important electronic concepts that must be taken into account for getting reliable circuits and Chapter 6 gives information about the main Electronic Design Automation (EDA) tools that are available for developing systems on FPGAs.

Chapters 7 to 13 are dedicated to the main arithmetic operations, namely addition (Chapter 7), multiplication (Chapter 8), division (Chapter 9), other operations such as square root, logarithm, exponentiation, trigonometric functions, base conversion (Chapter 10), decimal arithmetic (Chapter 11), floating-point arithmetic (Chapter 12), and finite-field arithmetic (Chapter 13). For every operation, several configurations are considered (combinational, sequential, pipelined, bit serial or parallel), and several generic models are available, thus, constituting a library of virtual components.

The development of Systems on Chip (SoC) is the topic of Chapters 14 to 16. The main concepts are presented in Chapter 14: embedded processors, memories, buses, IP components, prototyping boards, and so on. Chapter 15 presents two case studies, both based on commercial EDA tools and prototyping boards. Chapter 16 is an introduction to dynamic reconfiguration, a technique that allows reducing the area by modifying the device configuration at run time.

Table of Contents

Preface

Chapter 1 Basic building blocks
1.1 Combinational components
1.1.1 Boolean equations
1.1.2 Tables
1.1.3 Controllable connections
1.1.4 Arithmetic circuits
1.2 Sequential components
1.2.1 Flip-flops
1.2.2 Registers
1.2.3 Counters
1.2.4 Finite state machines
1.3 Memory blocks
1.4 IO-port components
1.5 VHDL models
1.6 Exercises
1.7 Bibliography

Chapter 2 Architecture of digital circuits
2.1 Introductory example
2.2 Data path and control unit
2.3 Operation scheduling
2.3.1 Introductory example
2.3.2 Precedence graph
2.3.3 Optimization problems
2.4 Resource assignment
2.5 Final example
2.6 Exercises
2.7 Bibliography

Chapter 3 Special topics of data path synthesis
3.1 Pipeline
3.1.1 Introductory example
3.1.2 Segmentation
3.1.3 Combinational to pipelined transformation
3.1.4 Interconnection of pipelined components
3.2 Loop unrolling and digit-serial processing
3.3 Data path connectivity
3.4 Exercises
3.5 Bibliography

Chapter 4 Control unit synthesis
4.1 Command encoding
4.2 Hierarchical control unit
4.3 Variable-latency operations
4.4 Exercises
4.5 Bibliography

Chapter 5 Electronic aspects of digital design
5.1 Basic electronic aspects of digital design
5.1.1 Basic concepts
5.1.1.1 CMOS circuits
5.1.1.1 Fan-in and fan-out
5.1.1.2 Drive strength or drive capabilities
5.1.1.3 Pull-up and pull-down resistors
5.1.1.4 Tri-states buffers and bus-keeper
5.1.2 Propagation delay - transition time
5.1.2.1 Rise time (transition time low-to-high)
5.1.2.2 Fall time (transition time high-to-low)
5.1.2.3 Slew rate
5.1.2.4 Propagation delay, intrinsic and extrinsic delays
5.1.2.5 Timing derating factors
5.1.3 Glitches in digital circuits
5.1.3.1 Runt pulse and spikes
5.2 Synchronous design issues
5.2.1 Edge sensitive and level sensitive registers
5.2.2 Temporal parameters of flip-flops
5.2.3. Metastability
5.2.3.1. Main causes of metastability
5.2.3.2. Mean time between failures (MTBF) in metastability
5.2.3.3. How to avoid or mitigate metastability
5.3 Clock distribution network
5.3.1 Clock skew
5.3.1.1 Setup violation due to clock skew
5.3.1.2 Hold violation due to clock skew
5.3.2 Clock jitter
5.3.3 Clock gating
5.3.4 Clock managers
5.3.4.1 Delay-Locked Loop (DLL)
5.3.5 Interfacing different clock domains
5.3.5.1 Using synchronizer
5.3.5.2 Handshake signaling
5.3.5.3 Asynchronous FIFO
5.3.5.4 Open loop communication
5.4 Power consumption
5.4.1 Sources of power consumption
5.4.1.1 Static power consumption
5.4.1.2 Dynamic power consumption
5.4.1.3 Power and energy
5.4.2 Power reduction techniques.
5.4.3 Power measurement and estimation
5.5 Exercises
5.6 Bibliography

Chapter 6 EDA tools
6.1 Design flow in FPGA EDA tools
6.1.1 Design Entry
6.1.1.1 HDL design entry
6.1.1.2 Schematic design entry
6.1.1.3 Intellectual Property (IP) blocks
6.1.1.4. Electronic System Level (ESL) languages
6.1.2 Synthesis
6.1.2.1 Synthesis optimizations
6.1.2.2 Synthesis constraints
6.1.2.3 Synthesis reports
6.1.3 Implementation (mapping, placement and routing)
6.1.3.1 Implementation reports
6.1.4 Programming File Generation and Programming
6.2 Implementation constraints
6.2.1 Timing constrains
6.2.2 Placement and other constrains
6.3 System verification
6.3.1 Simulation
6.3.2 Formal verification
6.3.3 In-circuit co-simulation
6.3.4 In-circuit testing and debugging
6.3.5 Design for Test
6.4 Timing Analysis
6.5 Power consumption estimation
6.5.1 Reducing the power consumption
6.6 Example of EDA tool usage
6.6.1 Simple example using Xilinx ISE
6.6.1.1 Design entry and behavioral simulation
6.8.1.2 Synthesis and synthesis report
6.8.1.3 Implementation: constraints and reports
6.8.1.4 Post place and route simulation
6.8.1.5 Running a static timing analysis
6.8.1.6 Generating programming file and programming the FPGA
6.8.1.7 Using command line implementation
6.8.1.8 Estimating the power consumption
6.8.2 Simple example using Altera Quartus II
6.7 Exercises
6.8 Bibliography

Chapter 7 Adders
7.1 Addition of natural numbers
7.2 Binary adder
7.3 Radix-2k adder
7.4 Carry select adders
7.5 Logarithmic adders
7.6 Long-operand adder
7.7 Multioperand adders
7.7.1 Sequential multioperand adders
7.7.2 Combinational multioperand adders
7.7.3 Parallel counters
7.8 Subtractors and adder-subtractors
7.9 FPGA implementations
7.9.1 Binary adder
7.9.2 Radix 2k adders
7.9.3 Carry select adder
7.9.4 Logarithmic adders
7.9.5 Long operand adder
7.9.6 Sequential multioperand adders
7.9.7 Combinational multioperand adders
7.9.8 Comparison
7.10 Exercises
7.11 Bibliography

Chapter 8 Multipliers
8.1 Basic algorithm
8.2 Combinational multipliers
8.2.1 Ripple-carry parallel multiplier
8.2.2 Carry-save parallel multiplier
8.2.3 Multipliers based on multioperand adders
8.2.4 Radix-2k and mixed-radix parallel multipliers
8.3 Sequential multipliers
8.3.1 Shift and add multiplier
8.3.2 Shift and add multiplier with CSA
8.4 Integers
8.4.1 Mod 2Bn+m multiplication
8.4.2 Modified shift and add algorithm
8.4.3 Post correction multiplication
8.4.4 Booth multiplier
8.5 Constant multipliers
8.6 FPGA implementations
8.6.1 Combinational multipliers
8.6.2 Radix-2k parallel multipliers
8.6.3 Sequential multipliers
8.6.4 Combinational multipliers for integers
8.6.5 Sequential multipliers for integers
8.7 Exercises
8.8 Bibliography

Chapter 9 Dividers
9.1 Basic digit-recurrence algorithm
9.2 Radix-2 division
9.2.1 Non-restoring divider
9.2.2 Restoring divider
9.2.3 Binary SRT divider
9.2.4 Binary SRT divider with carry-save adder
9.2.5 Radix-2k SRT dividers
9.3 Radix-B dividers
9.4 Convergence algorithms
9.5 FPGA implementations
9.5.1 Digit-recurrence algorithms
9.5.2 Convergence algorithms
9.6 Exercises
9.7 Bibliography

Chapter 10 Other operations
10.1 Binary to radix-B conversion (B even)
10.2 Radix-B to binary conversion (B even)
10.3 Square rooters
10.3.1 Restoring algorithm
10.3.2 Non-restoring algorithm
10.3.3 Fractional numbers
10.3.4 Convergence methods (Newton - Raphson)
10.4 Logarithm.
10.5 Exponential
10.6 Trigonometric functions
10.7 FPGA implementations
10.7.1 Converters
10.7.2 Square rooters
10.7.3 Logarithm and exponential
10.7.4 Trigonometric functions
10.8 Exercises
10.9 Bibliography
Chapter11 Decimal Operations
11.1. Addition
11.1.1. Decimal ripple-carry adders
11.1.2 Base-B carry-chain adders
11.1.3 Base-10 carry-chain adders
11.1.4 FPGA implementation of the base-10 carry-chain adders
11.2 Base-10 complement and addition: Subtration
11.2.1. Ten’s complement numeration system
11.2.2. Ten’s complement sign change
11.2.3 10’s complement BCD carry-chain adder-subtractor
11.2.4 FPGA implementations of adder subtractors
11.3 Decimal Multiplication
11.3.1 One-digit by one-digit BCD multiplication
11.3.1.1 Binary arithmetic with correction
11.3.1.2 Using ROM
11.3.2 N by one BCD digit multiplier
11.3.3 N by M digits multiplier
11.3 Decimal Division
11.3.1 Non-restoring division algorithm
11.3.2 An SRT-like division algorithm
11.3.3 Other methods for decimal division
11.4. FPGA implementation Results
11.4.1 adder-subtractor implementations
11.4.2 Multiplier implementations
11.4.2.1 Decimal N?1 digits implementation results
11.4.2.2 Sequential implementations
11.4.2.3 Combinational implementations of N by M multipliers
11.4.3 Decimal Division implementations
11.5 Exercises
11.6 Bibliography

Chapter 12 Floating Point Arithmetic
12.1 IEEE 754-2008 Standard
12.1.1 Formats
12.1.2 Arithmetic and Interchange Formats
12.2 Arithmetic operations
12.2.1 Addition of positive numbers
12.2.2 Difference of positive numbers
12.2.3 Addition and subtraction
12.2.4 Multiplication
12.2.5 Division
12.2.6 Square root
12.3 Rounding schemes
12.3.1 Rounding schemes in IEEE 754
12.4 Guard digits
12.5 Arithmetic Circuits
12.5.1 Adder - subtractor
12.5.1.1 Unpacking
12.5.1.2 Alignment
12.5.1.3 Addition and subtraction
12.5.1.4 Normalization and Rounding
12.5.1.5 Packing
12.5.2 Multiplier
12.5.3 Divider
12.5.4 Square root
12.5.5 Implementation Results
12.6 Exercises
12.7 Bibliography

Chapter 13 Finite-field arithmetic
13.1 Operations modulo m
13.1.1 Addition and subtraction mod m
13.1.2 Multiplication mod m
13.1.2.1 Multiply and reduce
13.1.2.2 Interleaved multiplier
13.1.2.3 Montgomery multiplication
13.2 Division modulo p
13.3 Operations over Z2[x]/f(x)
13.3.1 Addition and subtraction of polynomials
13.3.2 Multiplication modulo f(x)
13.3.2.1 Multiply and reduce
13.3.2.2 Interleaved multiplier
13.3.2.3 Squaring
13.4 Division over GF(2m)
13.5 FPGA implementations
13.6 Exercises
13.7 Bibliography

Chapter 14 Systems on Chip
14.1 System on Chip (SoC)
14.2 Intellectual Property (IP) cores
14.3 Embedded systems
14.3.1 Embedded microprocessors
14.3.2 Peripherals
14.3.3 Coprocessors
14.3.4 Memory
14.3.5 Busses
14.4 Bibliography

Chapter 15 Embedded systems development: case studies
15.1 Introduction to Xilinx EDK
15.2 Case Study 1-A: A Basic Embedded System
15.2.1 Hardware
15.2.1.1 Specification
15.2.1.2 Synthesis
15.2.1.3 Implementation
15.2.2 Software
15.2.2.1 Board Support Package (BSP)
15.2.2.2 Executable
15.2.3 Programming and Debugging
15.3 Case Study 1-B: Creating a custom peripheral
15.3.1 Design of a custom peripheral
15.3.1.1 Hardware design
15.3.1.2 Driver design
15.3.2 System’s hardware/software modification
15.3.2.1 Hardware modification
15.3.2.2 Software modification
15.3.3 Functional simulation
15.4 Case Study 2: Implementation of a custom coprocessor
15.4.1 A brief introduction to the AES-128 cipher
15.4.2 Software implementation of the AES-128 cipher
15.4.3 Profiling
15.4.4 Coprocessor design
15.4.4.1 Hardware design
15.4.4.2 Driver design
15.4.5 Modification of the embedded system
15.4.5.1 Hardware modification
15.4.5.2 Software modification
15.4.6 Simulation
15.4.7 Experimental results
15.5 Bibliography

Chapter 16 Partial reconfiguration on Xilinx FPGAs
16.1 Partial Reconfiguration on Xilinx FPGAs
16.2 Design flow for Partial Reconfiguration
16.3 Case study
16.3.1 Hardware design
16.3.1.1 Design of the embedded system on XPS
16.3.1.2 Synthesis of the RMs
16.3.2 Partial reconfiguration flow on PlanAhead
16.3.2.1 Set the PR flow and the design files
16.3.2.2 Set the reconfigurable partition and the reconfigurable modules
16.3.2.3 First layout implementation
16.3.2.4 Layout implementation for the next RMs
16.3.2.5 Bitstreams generation
16.3.3 Software development
16.3.3.1 Flashing the partial bitstreams
16.3.3.2 BSP generation
16.3.3.3 Executable
16.3.3.4 Testing
16.4 Bibliography

This site was last updated 02/22/12