v2.0.1
There's various ways to define "embedded"
I think of it as the HW and SW system that's part of some larger device for its sole purpose. Usually the user is not "aware" this system is there, only of the features of the "main" device.
Often real-time, power, physical and cost constraints are present.
Examples:
Microcontrollers are often referred to as MCUs (or 𝜇C*), and microprocessors as MPUs (or 𝜇P).
Well, it's complicated...
* "𝜇" is the Greek letter "mu"
A microcontroller is a kind of processor with at least RAM, and most often some sort of nonvolatile storage and peripherals on the IC (Integrated Circuit) die.
As compared to microprocessors, those generally depend on RAM and other components (peripherals) that aren't on the chip, except for SoC (System on Chip*) devices.
* more about SoC later.
MCUs are generally:
RTOS: Real-Time Operating System
The "Harvard Architecture" uses separate memory and busses for instructions vs data.
The "Von-Neumann Architecture" uses the same memory and bus for instructions and data.
There's numerous considerations about architecture choice, complexity and even security issues.
From a practical standpoint it matters to embedded devs as allocation of code and data space.
Generally, MCU's have both Program and Data memory.
Some MCUs offer protected address space/MMUs/MPUs*, privileged execution, and other MPU-like features.
* MMU is Memory Management Unit, and MPU here is Memory Protection Unit.
Program memory is nonvolatile, nowadays Flash/EEPROM. Some MCU's will copy the program from there into RAM for faster execution.
16KB to 128KB is most common, with a range of about <1KB to 128MB or even more.
Data memory is volatile, usually SRAM.
8KB to 256KB is most common, with a range of about 32B to 2MB.
Some MCUs have some limited amount of nonvolatile data memory (often EEPROM) for long-term storage, or in case of power outages, etc.
Clock Rates (Frequencies) run from about 40Mhz to 1GH, with a range of about 8MHz to 2.5Ghz
A clock "cycle" is equal to 1/clock_frequency
For a 16Mhz clock, a cycle is 0.0625 µs (microseconds). Each (machine) instruction takes 1 or more cycles.
Instruction pipelining is a technique for implementing instruction-level parallelism within a single processor.
Some MCUs are even superscalar: they can execute multiple instructions per cycle.
Pipelining and superscalar improve performance, but also complexity (thus cost), and make instruction timing much less deterministic. Which, for certain applications, is difficult or impossible to deal with.
A common relative measure of performance is DMIPS/MHz, where DMIPS is a "Dhrystone MIPS" benchmark.
Common values are tenths to as high as ten.
Many internal clocks aren't very accurate, but some MCUs can be improved by using an external crystal.
You may be also able to set higher or alternative clock frequencies.
Generally MCUs have limited or less-complex instruction sets, but some families include some with extended sets, SIMD (Single Instruction, Multiple Data), DSP (Digital Signal Processing), etc.
MCU's are commonly 8 or 32bit, less commonly 16bit, but rarely 64bit (the latter usually being MPU-class chips used w/o an OS).
Most MCU's are single core, but some are double, more, or even 8!
Other less common variations include "multi-context" designs, or "real-time units", or other esoteric designs.
SoC's (and SiP's) may include entirely separate processors.
While essentially all MCUs have "native" instructions for `add` and `subtract`, `multiplication` is not always present, and `division` is less common.
Multiplication and especially Division may be "expensive", taking as many as 17 cycles or more!
Techniques such as look-up tables, approximation, and others can be used to fill-in for missing operations.
Note that the common `modulo` operation requires division and so it's "expensive".
Floating-point math is even less common, although some MCU's may have support, or perhaps include a FPU (Floating Point Unit). Some FPU's may not be worth it as there's usually not only actual monetary "costs" but also the CPU processing of shifting data, context switching, etc.
Otherwise, simulating Floating-point in "fixed-point" math is slow and complex. One technique is called "binary scaling".
This refers to the order of multi-byte data.
This matters when doing certain types of bit manipulation, calculations and more.
Note that some systems have selectable endianness, are "mixed", or are other variations.
There's dozens of package types and variations. Definitions and terms can be difficult...
The Integrated Circuit "die" is a piece of silicon wafer that actually contains the transistors & circuits.
The Atmel ATtiny13a:
The package holds the die. From the tiny and simple (PIC 10F in a SOT-23 package):
To the common DIP - Dual Inline Package (Atmel ATtiny85):
To the large and latest LGA - Land Grid Array (TI TMS570 ARM Cortex):
How do we get more supporting components - from RAM to nonvolatile memory to oscillators and even power management into a "chip"?
* here this means capacitors, inductors, antennas, etc.
These more complicated chips are usually CPUs, but some like this TI OSD335x SiP contains not only the main ARM Cortex MPU but also a ARM Cortex-M3 MCU, and 2x "PRU" (Programmable Real-time Units*) in a BGA (Ball Grid Array) package:
* PRU is basically a marketing term for a MCU designed for real-time ops.
Chiplets (reusable functional circuit blocks) are the modular dies used in MCM, SiP, SoP.
The idea of chiplets is they are commodity, tried & tested, reusable, cost efficient-sized circuit "blocks".
Downsides include manufacturability, compatibility, standards, connection speed between chiplets, etc.
So far they're mostly manufacturer specific, but a growing movement to make them open and compatible.
Open Compute Project (OCP)'s Open Domain Specific Architecture (ODSA), started 2018. Another goal is a "Chiplet Marketplace".
Linux Foundation's CHIPS Alliance, started 2009:
an organization which develops and hosts high-quality, open source hardware code (IP cores), interconnect IP (physical and logical protocols), and open source software development tools for design, verification, and more.
- chipsalliance.org
Intel contributed to CHIPS the Advanced Interface Bus (AIB) as an open-source, royalty-free PHY-level standard for connecting multiple semiconductors dies.
A company called zGlue is offering custom MCM ("ZiP") chips for small batches, easily, and quickly, using a drag-and-drop visual design tool for composing chiplets.
Can you imagine small-batch "custom" modular open-source chips?!
Peripherals is a general name for interfaces or components on the "die" itself.
They operate largely independent of your code cycles, meaning they are working actually simultaneously with your code.
For example, a PWM (Pulse Width Modulation) pin set to a 50% duty cycle is setting that pin High and Low as needed, while your code continues unaffected.
Examples:
Many other less common ones exist - like memory controllers, crypto, DMA, EBI, graphics, video, infrared, motor control, LCD display, etc. - especially for purpose-specific MCUs.
Generally, as MCUs get more expensive/larger (and usually faster) they include not only more types of peripherals but also multiples of each.
Some MCUs have as few as 6 pins, or as many as 500 pins.
Access to any peripheral may be limited to only 1 (or 1 set) of pins, and usually shared with other peripherals.
Generally, only one peripheral on any pin(s) can be used at the same time.
Some MCUs offer "cross-bar" or "matrix" allocation of peripherals to pins to increase flexibility.
If a necessary peripheral doesn't exist, a technique called "bit banging" may be used to create the desired interface through timing-sensitive direct bit manipulation of GPIOs.
Example of one of my fav tiny MCUs:
ATtiny 828:
TI LM4F232H5QC:
MCUs usually have one or more power states, each with it's own power draw, transition latencies, and peripheral support.
Common terms for these states include sleep, deep sleep, doze, wait, low-power, slow, etc. Some MCUs allow changing the clock speed.
Some of the lowest power states may only draw a few tens of nA (nano-Amps).
Using some peripherals or subsystems will affect power draw, like pin pull-ups*.
Be careful, putting the MCU into some "sleep" states before defining a way to wake it up may result in bricking it!
* A GPIO (or other) pin in a input/read mode is considering in a "floating" (indeterminate) logic state, unless it or something else is lightly "pulling" it either high or low.
How do you get your code into the device?
In the past programming often required custom/expensive HW and SW. At some point one step requiring a UV light source to shine through a quartz "window" on the chip to erase (EPROM), then came Electrically-Erasable (EEPROM).
... but now mostly all you need is a USB to TTL (serial) adapter.
They're cheap, from < $2 for generic Chinese models, to name brand ~$30.
Development boards (like the Arduino or any number of vendor-specific ones) usually only need a USB cable, as they have the USB chip and logic onboard.
In-Circuit Serial Programming (ICSP), or In System Programming (ISP) are basically the same thing: allows programming in situ, ie without having to isolate the MCU from other circuitry.
They connect to port or pins, sometimes a JTAG (Joint Test Action Group) connector (4/5 pin), or sometimes vendor specific connectors (10/14/16/20 pin).
Vendor-specific programmers are often combined with debuggers, and cost more, and usually require vendor SW, but are often more reliable/easy to use:
Some MCUs on dev boards (like many from Adafruit) have a convenient and handy programming: plug in the USB and the flash mounts on your computer and then you can edit your program file (usually microPython, Arduino, MS makeCode, etc) directly.
Others mount their USB as a virtual comm port, of which a number of IDEs can interface, sometimes including a browser based one (Espruino, etc.)
Debugging hardware allows, when used with compatible IDEs/etc., for most or all of the common debugging concepts like breakpoints, step, step-over, watches, variable inquires, etc.
Without debugging HW, options include some sort of in-code debugging and of course HW, like writing to the serial connection, or from toggling an LED to driving 7-segment display or to a full matrix display.
In-code debugging has draw-backs such as slowing processing, complicating timing, or even "Heisenbugs"*.
* A Heisenbug is one that is either created, or resolved, by debugging it (often when outputting serially).
Briefly, Interrupts are either HW or SW generated, which call Interrupt Service Routine (ISR) / Interrupt Handler code.
Some MCUs support Interrupt Priority, interrupts inside interrupts, and other complexity.
SW Interrupts can be generated by user code, or during certain conditions like overflow or division by zero errors.
HW Interrupts are generated by peripherals, such as GPIO Edge/Level Trigger, timers/counters, communications events, and more.
Interrupt-driven coding is standard and is more efficient (both during regular execution, and for power management) than polling, although it adds complexity and takes more effort.
RTOS (Real-Time Operating System) is sometimes used on MCUs, especially in complex, timing-sensitive usages.
An RTOS is similar to the scheduling portion of an OS Kernel, and generally they try to be deterministic in their timing.
While RTOS' provide consistent timing, they may not have the latency desired, use memory and processing cycles. They're also "one more thing" to learn.
There's a number of free/open source (freeRTOS, ChibiOS, Apache Mynewt, Linux Foundation Zephyr), commercial (VxWorks, etc.), and vendor-specific RTOS' (TI RTOS, etc.) available.
AKA "fuses" or "fuse bits". They are generally out-of-band non-volatile (or even permanent) configuration changes that substantially effect the operation of the MCU.
Some uses of config bits include:
There's almost endless variations. Digikey lists 76,307 different MCU SKUs!
Many of these are variations on a basic chip, like package type, speed, and included memory.
ARM and MIPS are somewhat unique because they are not manufacturers (called "fabless" as they don't fabricate chips), rather they licensed chip technology standards that other manufacturers manufacture and sell.
The benefit to developers is that instruction sets are largely similar, but peripherals and HALs* vary across families.
*Hardware Abstraction Layer
The RISC-V Foundation is a non-profit group that controls and promotes the RISC-V ISA* which has been successfully commercialized. This is in contrast to (for profit) ARM, MIPS, Intel, Atmel, Microchip, etc.
* RISC-V ISA: is pronounced “risk five”, where RISC is Reduced Instruction Set Computing, and ISA is Instruction Set Architecture.
RISC-V was founded in 2015 and while far from the first open source ISA, they have become remarkably successful.
Much of the appeal of RISC-V is it's free*, cost-saving, performant, flexible, extensible and enables scalable domain-specific processors (as compared to GPPs**).
* RISC-V Foundation membership required
** General Purpose Processors
SiFive has developed and produced both the Freedom E310 (Arduino compatible RISC-V MCU), the Freedom U540 (quad-core 64-bit RISC-V SoC), etc.
Western Digital has announced plans to use RISC-V "processors in their flash controllers and SSDs". Nvidia, SEGGER, Qualcomm, Alibaba, and others are developing chips.
Field-Programmable Gate Array is a unique type of IC which contains a matrix of programmable logic blocks, which can be dynamically arranged to form basically whatever computation device is needed.
This is in comparison to GPPs (General Purpose Processors) or ASICs (Application-Specific IC).
Commercially, FPGAs are used often for development leading to ASICs, or themselves directly in products when otherwise an ASIC would be too expensive for the needed volume, or iteration is needed.
For hobbyists, for certain applications where no readily available MCU/CPU is appropriate or affordable, it can be a great alternative.
Like most things, history is complex, but here's a summary:
Originally, in the 1970s, there were Programmable Array Logic (PAL), Programmable Logic Array (PLA), and Generic/Gate Array Logic (GAL), etc., which really only replaced multiple discrete logic chips.
Those "logic arrays" evolved into the Complex Programmable Logic Device (CPLD, 1980s) which had many more "logic blocks"* and pins. They also have non-volatile memory which means they don't have to be re-configured on power-up.
* also know as LUTs, LABs, CLBs, logic array block, etc. Note that comparison of "logic blocks" across manufacturers, or even product lines, is difficult.
FPGA's (mid 1980s) have, as compared to CPLD's, again had many more logic blocks* and pins, sometimes "hard blocks"**, but not non-volatile mem so they need to be re-configured from an external memory device/controller on each power-up.
* Some may have as many as 100's of thousands!
** Hard Blocks are "permanent" - generally high-speed/performant peripherals.
Instead of "programming" an FPGA in a coding language, a Hardware Description Language (HDL) is used to define the "use" and connection of the logic blocks.
While some HDLs "look" like code, it doesn't execute (or compile/be interpreted), it defines the logic block which then "executes".
The most common HDLs are Verilog / SystemVerilog, and VHDL (VHSIC Hardware Description Language). There's lots of pros/cons and communities that use each, but if you're a hobbyist you're more likely to use Verilog.
There's a number of EDA tools (some OS) which can be used to simulate an FPGA so as to test "code" before it gets on HW.
Formal verification uses mathematical analysis to verify the output of design such that expected output is guaranteed given assertions and constraints.
module top(
input CLK,
input BTN_N,
input [7:0] sw,
output [4:0] led,
output [6:0] seg,
output ca
);
wire [7:0] disp0;
wire displayClock;
reg [7:0] storedValue;
assign led[4:0]=sw[4:0];
// module continued
always @(negedge BTN_N)
storedValue<=sw;
nibble_to_seven_seg nibble0(
.nibblein(dispValue[3:0]),
.segout(disp0)
);
clkdiv displayClockGen(
.clk(CLK),
.clkout(displayClock)
);
endmodule
Historically and today "official" FPGA toolchains are vendor specific, expensive, and largely undocumented.
But now there's a fully open-source Verilog toolchain solution*.
SymbiFlow: "the GCC of FPGAs" (Xilinx 7-Series, Lattice iCE40 & ECP5) - which IceStorm was integrated into.
It utilizes sub-systems like Yosys, Arachne-pnr, NextPNR, and migen, etc.
* lots of effort went into reverse engineering the proprietary bitstreams and developing exceedingly complex synthesis/routing/etc. tools (most of which are functional but still under strong development)
This has resulted in a number of open source development boards like:
Some processor "cores" (such as RISC-V, ARM, etc.) are available in HDL so that an FPGA could *contain* a MCU/CPU as a "soft core".
Which means you'd first "program" the FPGA for the MCU core and whatever peripherals or logic you needed using a HDL, then program the core itself (probably with C, etc.).
Pros: inherently multiprocessing, massively parallel, any peripheral needed, flexible, and reconfigurable.
Cons: slower, more energy consuming, more expensive
While FPGAs were once very expensive, simple ones can be had for just a couple dollars each (although they can cost into the 100's of dollars).
As mentioned some FPGAs come with "hard block" MCU cores, often called cSoC's.
Some MCUs come with CCLs (Custom Configurable Logic) or similar named functionality that's somewhat like CPLDs were. The new Arduino Uno WiFi includes this.