Microcontroller Architecture for Embedded Programming

v2.0.1

Embedded?

There's various ways to define "embedded"

I think of it as the HW and SW system that's part of some larger device for its sole purpose. Usually the user is not "aware" this system is there, only of the features of the "main" device.

Often real-time, power, physical and cost constraints are present.

Examples:

A picture of the Lexus LF-A concept car at the 2006 Greater Los Angeles Auto Show. Machine à laver Fagor front naked tickle me elmo Olympus OM-D E-M1 Mark II D81 8378-2

What is a microcontroller?

Microcontrollers are often referred to as MCUs (or 𝜇C*), and microprocessors as MPUs (or 𝜇P).

Well, it's complicated...

* "𝜇" is the Greek letter "mu"

Microcontrollers, cont.

A microcontroller is a kind of processor with at least RAM, and most often some sort of nonvolatile storage and peripherals on the IC (Integrated Circuit) die.

As compared to microprocessors, those generally depend on RAM and other components (peripherals) that aren't on the chip, except for SoC (System on Chip*) devices.

* more about SoC later.

Microcontrollers, cont.

MCUs are generally:

  • Slower: 1Mhz (or lower) all the way up to ... 1Ghz+, and even "simple" instructions may take multiple clock cycles
  • Simpler: limited instruction sets, little RAM, few/no pipelines, no privileged exec., etc.
  • Less expensive: From $0.03 to $20 each (in quantities of 100 or more, generally)
  • Do not run an OS, although sometimes an RTOS*

RTOS: Real-Time Operating System

Memory and Code Architecture

The "Harvard Architecture" uses separate memory and busses for instructions vs data.

The "Von-Neumann Architecture" uses the same memory and bus for instructions and data.

Memory and Code Architecture, cont.

There's numerous considerations about architecture choice, complexity and even security issues.

From a practical standpoint it matters to embedded devs as allocation of code and data space.

Generally, MCU's have both Program and Data memory.

Memory and Code Architecture, cont.

Some MCUs offer protected address space/MMUs/MPUs*, privileged execution, and other MPU-like features.

* MMU is Memory Management Unit, and MPU here is Memory Protection Unit.

Program Memory

Program memory is nonvolatile, nowadays Flash/EEPROM. Some MCU's will copy the program from there into RAM for faster execution.

16KB to 128KB is most common, with a range of about <1KB to 128MB or even more.

Data Memory (RAM)

Data memory is volatile, usually SRAM.

8KB to 256KB is most common, with a range of about 32B to 2MB.

Some MCUs have some limited amount of nonvolatile data memory (often EEPROM) for long-term storage, or in case of power outages, etc.

Instruction Clock

Clock Rates (Frequencies) run from about 40Mhz to 1GH, with a range of about 8MHz to 2.5Ghz

A clock "cycle" is equal to 1/clock_frequency

For a 16Mhz clock, a cycle is 0.0625 µs (microseconds). Each (machine) instruction takes 1 or more cycles.

Instruction Clock, cont.

Instruction pipelining is a technique for implementing instruction-level parallelism within a single processor.

Some MCUs are even superscalar: they can execute multiple instructions per cycle.

Pipelining and superscalar improve performance, but also complexity (thus cost), and make instruction timing much less deterministic. Which, for certain applications, is difficult or impossible to deal with.

Instruction Clock, cont.

A common relative measure of performance is DMIPS/MHz, where DMIPS is a "Dhrystone MIPS" benchmark.

Common values are tenths to as high as ten.

source https://www.embedded.com/print/4008863

Instruction Clock, cont.

Many internal clocks aren't very accurate, but some MCUs can be improved by using an external crystal.

You may be also able to set higher or alternative clock frequencies.

Instruction Sets

Generally MCUs have limited or less-complex instruction sets, but some families include some with extended sets, SIMD (Single Instruction, Multiple Data), DSP (Digital Signal Processing), etc.

Data Width

MCU's are commonly 8 or 32bit, less commonly 16bit, but rarely 64bit (the latter usually being MPU-class chips used w/o an OS).

Cores

Most MCU's are single core, but some are double, more, or even 8!

Other less common variations include "multi-context" designs, or "real-time units", or other esoteric designs.

SoC's (and SiP's) may include entirely separate processors.

Math

While essentially all MCUs have "native" instructions for `add` and `subtract`, `multiplication` is not always present, and `division` is less common.

Multiplication and especially Division may be "expensive", taking as many as 17 cycles or more!

Math, cont.

Techniques such as look-up tables, approximation, and others can be used to fill-in for missing operations.

Note that the common `modulo` operation requires division and so it's "expensive".

Floating Point Math

Floating-point math is even less common, although some MCU's may have support, or perhaps include a FPU (Floating Point Unit). Some FPU's may not be worth it as there's usually not only actual monetary "costs" but also the CPU processing of shifting data, context switching, etc.

Otherwise, simulating Floating-point in "fixed-point" math is slow and complex. One technique is called "binary scaling".

Endianness

This refers to the order of multi-byte data.

  • Little Endian has the least-significant bytes first.
  • Big Endian has the most-significant bytes first.

This matters when doing certain types of bit manipulation, calculations and more.

Note that some systems have selectable endianness, are "mixed", or are other variations.

Physical Aspects

There's dozens of package types and variations. Definitions and terms can be difficult...

IC Die

The Integrated Circuit "die" is a piece of silicon wafer that actually contains the transistors & circuits.

The Atmel ATtiny13a:

Package

The package holds the die. From the tiny and simple (PIC 10F in a SOT-23 package):

Package, cont.

To the common DIP - Dual Inline Package (Atmel ATtiny85):

Package, cont.

To the large and latest LGA - Land Grid Array (TI TMS570 ARM Cortex):

Variations

How do we get more supporting components - from RAM to nonvolatile memory to oscillators and even power management into a "chip"?

Variations

  • System-On-Chip (SoC) - on one die: CPU, RAM, etc.
  • Multi-Chip Modules (MCM) - modules (dies) often known as "chiplets"
  • Systems-In-Package (SiP) - stacks dies vertically
  • Systems-On-Package (SoP) - includes discrete components*
  • Customizable System-On-Chip (cSoC) - includes an FPGA

* here this means capacitors, inductors, antennas, etc.

Variations, cont.

Comparison among SOC, MCM, SIP, and SOP.

Variations, cont.

These more complicated chips are usually CPUs, but some like this TI OSD335x SiP contains not only the main ARM Cortex MPU but also a ARM Cortex-M3 MCU, and 2x "PRU" (Programmable Real-time Units*) in a BGA (Ball Grid Array) package:

* PRU is basically a marketing term for a MCU designed for real-time ops.

Variations, cont.

Chiplets (reusable functional circuit blocks) are the modular dies used in MCM, SiP, SoP.

The idea of chiplets is they are commodity, tried & tested, reusable, cost efficient-sized circuit "blocks".

Downsides include manufacturability, compatibility, standards, connection speed between chiplets, etc.

So far they're mostly manufacturer specific, but a growing movement to make them open and compatible.

Variations, Open Source

Open Compute Project (OCP)'s Open Domain Specific Architecture (ODSA), started 2018. Another goal is a "Chiplet Marketplace".

ODSA open interface for SiP Designs

Variations, Open Source, cont.

Linux Foundation's CHIPS Alliance, started 2009:

an organization which develops and hosts high-quality, open source hardware code (IP cores), interconnect IP (physical and logical protocols), and open source software development tools for design, verification, and more. - chipsalliance.org

Variations, Open Source, cont.

Intel contributed to CHIPS the Advanced Interface Bus (AIB) as an open-source, royalty-free PHY-level standard for connecting multiple semiconductors dies.

Intel EMIB Rocket Lake S CPUs

Variations, cont.

A company called zGlue is offering custom MCM ("ZiP") chips for small batches, easily, and quickly, using a drag-and-drop visual design tool for composing chiplets.

Can you imagine small-batch "custom" modular open-source chips?!

Peripherals

Peripherals is a general name for interfaces or components on the "die" itself.

They operate largely independent of your code cycles, meaning they are working actually simultaneously with your code.

For example, a PWM (Pulse Width Modulation) pin set to a 50% duty cycle is setting that pin High and Low as needed, while your code continues unaffected.

Peripherals, cont.

Examples:

  • ADC (Analog to Digital Converter) or the reverse, DAC
  • SPI (Serial Peripheral Interface), pronounced "spy"; and extensions Dual SPI, (SQI) Quad SPI, etc.
  • I²C/IIC (Inter-Integrated Circuit), generally known as TWI (two-wire interface)
  • I²S/IIS (Inter-IC Sound)
  • USB (Universal Serial Bus)
  • Ethernet/WiFi

Peripherals, cont.

  • n-bit timers and/or counters
  • CAN (Controller Area Network) - intended for vehicles
  • RTC (Real-Time Clock)
  • GPIO (General Purpose I/O)
  • Various external memory interfaces
  • FPU
  • Comparators

Peripherals, cont.

  • UART - Universal Asynchronous Receiver/Transmitter
  • WDT - Watchdog timer
  • 1-Wire®

Many other less common ones exist - like memory controllers, crypto, DMA, EBI, graphics, video, infrared, motor control, LCD display, etc. - especially for purpose-specific MCUs.

Peripherals, continued

Generally, as MCUs get more expensive/larger (and usually faster) they include not only more types of peripherals but also multiples of each.

Some MCUs have as few as 6 pins, or as many as 500 pins.

Peripherals, continued

Access to any peripheral may be limited to only 1 (or 1 set) of pins, and usually shared with other peripherals.

Generally, only one peripheral on any pin(s) can be used at the same time.

Some MCUs offer "cross-bar" or "matrix" allocation of peripherals to pins to increase flexibility.

Peripherals, continued

If a necessary peripheral doesn't exist, a technique called "bit banging" may be used to create the desired interface through timing-sensitive direct bit manipulation of GPIOs.

Pinouts

Example of one of my fav tiny MCUs:

Pinouts

ATtiny 828:

Pinouts

TI LM4F232H5QC:

Power Management

MCUs usually have one or more power states, each with it's own power draw, transition latencies, and peripheral support.

Common terms for these states include sleep, deep sleep, doze, wait, low-power, slow, etc. Some MCUs allow changing the clock speed.

Power Management

Some of the lowest power states may only draw a few tens of nA (nano-Amps).

Using some peripherals or subsystems will affect power draw, like pin pull-ups*.

Be careful, putting the MCU into some "sleep" states before defining a way to wake it up may result in bricking it!

* A GPIO (or other) pin in a input/read mode is considering in a "floating" (indeterminate) logic state, unless it or something else is lightly "pulling" it either high or low.

Programming

How do you get your code into the device?

In the past programming often required custom/expensive HW and SW. At some point one step requiring a UV light source to shine through a quartz "window" on the chip to erase (EPROM), then came Electrically-Erasable (EEPROM).

Programming, cont.

... but now mostly all you need is a USB to TTL (serial) adapter.

They're cheap, from < $2 for generic Chinese models, to name brand ~$30.

Programming, cont.

Development boards (like the Arduino or any number of vendor-specific ones) usually only need a USB cable, as they have the USB chip and logic onboard.

Programming, cont.

In-Circuit Serial Programming (ICSP), or In System Programming (ISP) are basically the same thing: allows programming in situ, ie without having to isolate the MCU from other circuitry.

They connect to port or pins, sometimes a JTAG (Joint Test Action Group) connector (4/5 pin), or sometimes vendor specific connectors (10/14/16/20 pin).

Programming, cont.

Vendor-specific programmers are often combined with debuggers, and cost more, and usually require vendor SW, but are often more reliable/easy to use:

  • PICkit4 $50
  • (PIC) ICD3 $200
  • Atmel ICE $90
  • 1 Bit Squared's Black Magic Probe $60 (for ARM)

Programming, cont.

pickit4

Programming, cont.

Some MCUs on dev boards (like many from Adafruit) have a convenient and handy programming: plug in the USB and the flash mounts on your computer and then you can edit your program file (usually microPython, Arduino, MS makeCode, etc) directly.

Others mount their USB as a virtual comm port, of which a number of IDEs can interface, sometimes including a browser based one (Espruino, etc.)

Programming, debugging

Debugging hardware allows, when used with compatible IDEs/etc., for most or all of the common debugging concepts like breakpoints, step, step-over, watches, variable inquires, etc.

Programming, debugging cont.

Without debugging HW, options include some sort of in-code debugging and of course HW, like writing to the serial connection, or from toggling an LED to driving 7-segment display or to a full matrix display.

In-code debugging has draw-backs such as slowing processing, complicating timing, or even "Heisenbugs"*.

* A Heisenbug is one that is either created, or resolved, by debugging it (often when outputting serially).

Interrupts

Briefly, Interrupts are either HW or SW generated, which call Interrupt Service Routine (ISR) / Interrupt Handler code.

Some MCUs support Interrupt Priority, interrupts inside interrupts, and other complexity.

Interrupts, cont.

SW Interrupts can be generated by user code, or during certain conditions like overflow or division by zero errors.

HW Interrupts are generated by peripherals, such as GPIO Edge/Level Trigger, timers/counters, communications events, and more.

Interrupts, cont.

Interrupt-driven coding is standard and is more efficient (both during regular execution, and for power management) than polling, although it adds complexity and takes more effort.

RTOS

RTOS (Real-Time Operating System) is sometimes used on MCUs, especially in complex, timing-sensitive usages.

An RTOS is similar to the scheduling portion of an OS Kernel, and generally they try to be deterministic in their timing.

RTOS, cont.

While RTOS' provide consistent timing, they may not have the latency desired, use memory and processing cycles. They're also "one more thing" to learn.

There's a number of free/open source (freeRTOS, ChibiOS, Apache Mynewt, Linux Foundation Zephyr), commercial (VxWorks, etc.), and vendor-specific RTOS' (TI RTOS, etc.) available.

Configuration Bits

AKA "fuses" or "fuse bits". They are generally out-of-band non-volatile (or even permanent) configuration changes that substantially effect the operation of the MCU.

Configuration Bits, cont.

Some uses of config bits include:

  • Clock speed/divider
  • Clock source and type
  • WatchDog Timer enable
  • Brown-out Reset
  • Various memory protection
  • Debugger enable
  • JTAG enable
  • and more...

Noteable MCUs

There's almost endless variations. Digikey lists 76,307 different MCU SKUs!

Many of these are variations on a basic chip, like package type, speed, and included memory.

Noteable MCUs, cont.

ARM and MIPS are somewhat unique because they are not manufacturers (called "fabless" as they don't fabricate chips), rather they licensed chip technology standards that other manufacturers manufacture and sell.

The benefit to developers is that instruction sets are largely similar, but peripherals and HALs* vary across families.

*Hardware Abstraction Layer

Common MCUs, cont.

  • Microchip (was Atmel) ATtiny, ATmega, and ARM variants - many of which power the now extensive line of Arduinos
  • Microchip PIC, PIC24, PIC32, and others
  • TI MSP line, and AM335x (ARM, powering the Beagle Boards)

Common MCUs, cont.

  • Espressif Systems ESP8266 & ESP32 - notable for included WiFi and BT
  • Parallax Propeller
  • Cypress PsoC (programmable system-on-chip) - ARM
  • NXP LPC - ARM
  • STMicroelectronics STM8 and STM32 - ARM

Open Source MPUs/MCUs

The RISC-V Foundation is a non-profit group that controls and promotes the RISC-V ISA* which has been successfully commercialized. This is in contrast to (for profit) ARM, MIPS, Intel, Atmel, Microchip, etc.

* RISC-V ISA: is pronounced “risk five”, where RISC is Reduced Instruction Set Computing, and ISA is Instruction Set Architecture.

Open Source MPUs/MCUs, cont.

RISC-V was founded in 2015 and while far from the first open source ISA, they have become remarkably successful.

Much of the appeal of RISC-V is it's free*, cost-saving, performant, flexible, extensible and enables scalable domain-specific processors (as compared to GPPs**).

* RISC-V Foundation membership required
** General Purpose Processors

Open Source MPUs/MCUs, cont.

SiFive has developed and produced both the Freedom E310 (Arduino compatible RISC-V MCU), the Freedom U540 (quad-core 64-bit RISC-V SoC), etc.

Western Digital has announced plans to use RISC-V "processors in their flash controllers and SSDs". Nvidia, SEGGER, Qualcomm, Alibaba, and others are developing chips.

FPGAs

Field-Programmable Gate Array is a unique type of IC which contains a matrix of programmable logic blocks, which can be dynamically arranged to form basically whatever computation device is needed.

This is in comparison to GPPs (General Purpose Processors) or ASICs (Application-Specific IC).

FPGAs, use

Commercially, FPGAs are used often for development leading to ASICs, or themselves directly in products when otherwise an ASIC would be too expensive for the needed volume, or iteration is needed.

For hobbyists, for certain applications where no readily available MCU/CPU is appropriate or affordable, it can be a great alternative.

FPGAs, history

Like most things, history is complex, but here's a summary:

Originally, in the 1970s, there were Programmable Array Logic (PAL), Programmable Logic Array (PLA), and Generic/Gate Array Logic (GAL), etc., which really only replaced multiple discrete logic chips.

FPGAs, history cont.

Those "logic arrays" evolved into the Complex Programmable Logic Device (CPLD, 1980s) which had many more "logic blocks"* and pins. They also have non-volatile memory which means they don't have to be re-configured on power-up.

* also know as LUTs, LABs, CLBs, logic array block, etc. Note that comparison of "logic blocks" across manufacturers, or even product lines, is difficult.

FPGAs, history cont.

FPGA's (mid 1980s) have, as compared to CPLD's, again had many more logic blocks* and pins, sometimes "hard blocks"**, but not non-volatile mem so they need to be re-configured from an external memory device/controller on each power-up.

* Some may have as many as 100's of thousands!
** Hard Blocks are "permanent" - generally high-speed/performant peripherals.

FPGAs, "programming"

Instead of "programming" an FPGA in a coding language, a Hardware Description Language (HDL) is used to define the "use" and connection of the logic blocks.

FPGAs, "programming", cont.

While some HDLs "look" like code, it doesn't execute (or compile/be interpreted), it defines the logic block which then "executes".

The most common HDLs are Verilog / SystemVerilog, and VHDL (VHSIC Hardware Description Language). There's lots of pros/cons and communities that use each, but if you're a hobbyist you're more likely to use Verilog.

FPGAs, "programming", cont.

There's a number of EDA tools (some OS) which can be used to simulate an FPGA so as to test "code" before it gets on HW.

Formal verification uses mathematical analysis to verify the output of design such that expected output is guaranteed given assertions and constraints.

FPGAs, sample Verilog


module top(
  input CLK,
  input BTN_N,
  input [7:0] sw,
  output [4:0] led,
  output [6:0] seg,
  output ca
  );

  wire [7:0] disp0;
  wire displayClock;
  reg [7:0] storedValue;

  assign led[4:0]=sw[4:0];

FPGAs, sample Verilog


// module continued
  always @(negedge BTN_N)
    storedValue<=sw;

  nibble_to_seven_seg nibble0(
    .nibblein(dispValue[3:0]),
    .segout(disp0)
  );

  clkdiv displayClockGen(
    .clk(CLK),
    .clkout(displayClock)
  );
endmodule

FPGAs, toolchains

Historically and today "official" FPGA toolchains are vendor specific, expensive, and largely undocumented.

FPGAs, toolchains, cont.

But now there's a fully open-source Verilog toolchain solution*.

Symbi­Flow: "the GCC of FPGAs" (Xilinx 7-Series, Lattice iCE40 & ECP5) - which IceStorm was integrated into.

It utilizes sub-systems like Yosys, Arachne-pnr, NextPNR, and migen, etc.

* lots of effort went into reverse engineering the proprietary bitstreams and developing exceedingly complex synthesis/routing/etc. tools (most of which are functional but still under strong development)

FPGAs, OS Boards

This has resulted in a number of open source development boards like:

FPGAs, cont.

Some processor "cores" (such as RISC-V, ARM, etc.) are available in HDL so that an FPGA could *contain* a MCU/CPU as a "soft core".

Which means you'd first "program" the FPGA for the MCU core and whatever peripherals or logic you needed using a HDL, then program the core itself (probably with C, etc.).

FPGAs, cont.

Pros: inherently multiprocessing, massively parallel, any peripheral needed, flexible, and reconfigurable.

Cons: slower, more energy consuming, more expensive

While FPGAs were once very expensive, simple ones can be had for just a couple dollars each (although they can cost into the 100's of dollars).

FPGA hybrids

As mentioned some FPGAs come with "hard block" MCU cores, often called cSoC's.

Some MCUs come with CCLs (Custom Configurable Logic) or similar named functionality that's somewhat like CPLDs were. The new Arduino Uno WiFi includes this.

About Todd