C as Implemented in Assembly Language

There are no MCUs which execute C, only machine code.

微处理器不能直接运行C

So we compile the C to assembly code, a human- readable representation of machine code

当我们编译C程序时，编译器和汇编器需要知道目标(微处理器/体系结构)，以便生成适合目标的指令和机器代码。

某一区域的大小：末位-初始位+1(旁边标的是byte)

Addressing Modes in ARM

寻址模式

LDR R0, [R1]  ；寄存器的间接寻址

LDRB R0, [R1, #0x3]//读R1+3的地址位置的值

Immediate Offset (pre-index)

The memory address of the data transfer is the sum of a register value and an immediate constant value (offset).

LDRB R0, [R1, #0x3]//读R1+3的地址位置的值

Read a byte value from address R1+0x3, and store the read data in R0.

如果末尾加！说明支持数据写回

6ebf855e896986956777b84b8fa6cad

先更新R1再更新R0

举例：

R0中为0x000000AB

The offset value can be positive or negative.

Register Offset (pre-index)

另一种有用的寻址模式为寄存器偏移，用于所处理的数据数组的地址为基地址和从索引值计算出的偏移得到的情况。为了进一步提高地址计算的效率，在加到基地址寄存器前，索引值可以进行0～3位的移位。例

LDR R3, [R0, R2, LSL #2] ;将存储器[RO+(R2<<2)]读入R3
STR R5,[R0,R7] ;将R5写入存储器[RO+R7]

Read memory [R0 + (R2 << 2)] into R3.

0010(2)>>1000(8)　快速算移位： $2*2^2$

举例：

R3中为0xABCD1234

This address mode can be used with different sizes (B, H) and signs (S)

Register Offset (Post-index)

When the offset is provided as post-index (i.e. outside [ ]), then it is not used in memory access, but used to update the address register. 它会在数据传输结束后更新地址寄存器。

LDR R0, [R1], #4 ;读取存储器[Rl],然后R1被更新为R1+偏移

Read memory [R1] into R0 then update R1 to R1 + 4.

R0:0xABCD1234 然后 R1存的地址加4

存储器访问可以产生相对于当前PC的地址值和偏移值

它常用于将立即数加载到寄存器中，也可被称作文本池访问（ literal pool accesses）

LDR R0, =0x12345678 ; Set R0 to 0x12345678

pseudo-instruction ：由于是pseudo-instruction所以不用# 用=

一个指令最大32bit所以没法包含一个32bit的数据，所以肯定是pseudo-instruction

实际不是一个指令，转化为下面

LDR R0, [PC, #offset] //Register Offset
...
DCD 0x12345678

DCD （ DCDU ）用于分配一片连续的字存储单元并用指定的数据初始化。

The LDR instruction reads the memory at [PC + offset] and stores the value into R0. The assembler will calculate the offset for you so you don’t have to worry about it. 自动计算的

这里存在code区域因为PC是在code区域

LDR R3, =MY_NUMBER ; Get the mem loc of MY_NUMBER //MY_NUMBER 是地址，point to 0x2000ABCC
LDR R4, [R3] ; Read the value 0x2000ABCC into R4
LDR R0, =HELLO_TEXT ; Get the starting addr of HELLO_TEXT （H的地址）
LDRB R2, [R0] Ｒ２中为H
…
ALIGN 4 ; align the data at word (4 bytes) boundary
MY_NUMBER DCD 0x2000ABCC
HELLO_TEXT DCB “Hello\n”, 0 ; Null terminated string

• Insert data inside programs

• DCD: insert a word-size data

• DCB: insert a byte-size data

DCD用于插入字大小的数据，而DCB则用于将字节大小的数据插入到程序中

• ALIGN（伪指令）:

Used before inserting a word-size data

Uses a number to determine the alignment size 对齐大小

Branching

ARM Thumb-2 Instruction： two bytes(半字)

Most cases: 2 bytes (half word); otherwise: 4 bytes (a word)

带常数的一般为4B

The first halfword (hw1) determines the instruction length and functionality. If the processor decodes the instruction as 32-bit long, then the processor fetches the second halfword (hw2).

First instruction MOV requires 4 bytes to encode the immediate data 10 (0x0A) within

Program Counter and Fetch

value of PC is incremented by 2 whenever a half word has been fetched during execution.

每fetch 一个半字，PC就加2

Offset and Branch Instruction

通过改变PC的值，我们branch(跳转)到另一条指令

The difference (in bytes) between the new address in PC and the current address is called the offset.

Branch Instructions

类似goto

分支指令

Programmer can choose the appropriate suffix for conditional branches.

B无条件跳转

B+各种Condition Code：有条件跳转

(26条消息) 汇编语言：ARM汇编中的跳转/分支指令【ARM汇编系列–指令篇01】_Lytain2022的博客-CSDN博客

在函数中如果调用子函数需要用BL（更新LR）从子函数返回主函数需要用BX（寄存器中地址）

Condition Code Suffixes

当处理器工作在ARM状态时，几乎所有的指令均根据CPSR中条件码的状态和指令的条件域有条件的执行。当指令的执行条件满足时，指令被执行，否则指令被忽略。

CMP R0, R1 ; compare R0 and R1, update APSR
CMP R2, #100 ; compare R2 and 100, update APSR

CMP是在做减法

Control Flow

ARM汇编语言入门（四） - 知乎 (zhihu.com)

While Loop

	; ARM asm
	{COND_SETUP …}
COND
	{COND_EVAL …}
	B<cond> EXIT
	{INSTS1 …}
	{INSTS2 …}
	B COND
EXIT

Example

x = n - 1; // x@0x20000000, n@0x20000004
while (n % x != 0) {
x--;
}

LDR R2, =0x20000004
LDR R0, [R2] ; R0 = n
SUBS R1, R0, #1 ; R1 = x
WHILE_BEGIN 
UDIV R2, R0, R1 ; R2 = n / x
MUL R3, R2, R1 ; R3 = R2 * x
CMP R0, R3 ; n == (n / x) * x
BEQ WHILE_END
SUBS R1, R1, #1 ; x--
B WHILE_BEGIN ; loop back
WHILE_END
LDR R2, =0x20000000 ; write back to mem
STR R1, [R2]

DoWhile Loop

	{COND_SETUP …}
COND
	{INSTS1 …}
	{INSTS2 …}
	{COND_EVAL …}
	B<cond> COND

For

for(A;B;C)
{
stat 1;
stat 2;
}

	{COND_SETUP …} ;A
	{COND_EVAL …}  ;B
COND
	B<cond> Exit
	{INSTS1 …}
	{INSTS2 …}
	{COND_UPDATE C …}  ;比while多这一行
	B<cond> COND
Exit

If-else

IFBEGIN
	{COND_SETUP …}
	{COND_EVAL …}
IFPART
	B<cond> ELSEPART
	{INSTS1 …}
	B IFEND
ELSEPART 
	{INSTS2 …}
IFEND

example

// x, y, z @ 0x20000000, 04, 08 
if (x >= y) {
z = x;
} else {
z = y;
}

	LDR R3, =0x20000000
	LDR R0, [R3], #4 ;R3=0x20000004 R0=X
	LDR R1, [R3], #4 ;R3=0x20000008 R1=Y
IFBEGIN
	CMP R0, R1
	BLT ELSEPART
IFPART
	STR R0, [R3]
	B IFEND
ELSEPART
	STR R1, [R3]
IFEND

Case

; ARM asm
SWITCH_BEGIN 
	{COND_SETUP …}
	{COND_EVAL …}
CHOICE1
	B<cond> CHOICE2
	{INST1 …}
	B SWITCH_END
CHOICE2
	B<cond> CHOICE3
	{INST2 …}
	B SWITCH_END
…
SWITCH_END 
…

Case结构从许多语句中选择一个，这实际上是一个迭代的if-else结构7

练习

for (i = 0; i < 10; i++){ x += i; }

	MOV R0,#0   ;x
	MOV R1,#0	;i
FOR_BEGIN
	CMP R1,#10
	BGE EXIT
	add R0,R0,R1
	add R1,R1,#1;   update condition
	B FOR_BEGIN
EXIT

while (x < 10) {x = x + 1;}

	MOV R0,#0 ;x
While_Begin
	cmp R0,#10
	BGE EXIT
	add R0,#1
	B While_Begin
EXIT

do {x += 2;} while (x < 20);

MOV R0,#0 ;x
DOWhile_Begin
	add R0,R0,#2
CMP R0,#20
BLT DOWhile_Begin

Stack and Functions

Stack

Cortex-M processors use a stack memory model called full descending （满降）stack:

降栈：随着数据的入栈，SP指针从高地址->低地址移动（（高地址）为栈底）

满栈：当堆栈指针总是指向最后压入堆栈的数据

When started, SP is set to the end of stack memory space.

PUSH operation: SP = SP – 4, then store the value @SP

POP operation: read the value @SP then SP = SP + 4

PUSH {R0, R4-R7} ; Push r0, r4, r5, r6, r7
POP {R2-R3, R5} ; Pop to r2, r3, r5
;只是把r0, r4, r5, r6, r7的值存入stack，PoP是把之前存的值对应存到{}内的寄存器里

The list of registers (reglist) is specified with braces ({ }) in UAL.

应该是R3先入栈，R0后入，入栈SP-4

R0先出，R3后出，出栈SP+4

The lower numbered the register is, the lower memory address in the stack.

大寄存器存大地址，小寄存器存小地址

大寄存器先入，小寄存器后入

先出到小寄存器，后出到大寄存器

Functions

saves code memory by reusing the functions.

LR: Storing the Return Address

调用时（BL label ）

BL label instruction: 存执行完function后要跳回(the next instruction will be execute)执行的地址。

Mechanism:

将PC中的address复制到LR中，并 set bit[0] = 1 （用于表示Thumb模式）

例子:

执行190后，将PC中的地址194存入LR中，并设置 bit[0] = 1，存入LR中（195）。随后PC设为100，将从func1的头开始执行instruction

返回时（BX LR）

BX LR instruction: 用于执行完function后将LR存着的地址(执行完function后要接着执行instruction的address)存回PC中。

Mechanism：

将LR中的address复制到PC中，并 set bit[0] = 0

例子:BX LR

function执行完后，于12C调用BX LR，跳回caller的下一条位置。在执行12C时，已经Fetch了12E进PC中（12E这条不会执行），而12C执行BX LR的结果为将LR中的195先 bit[0] - 1变为194后，再将其存入PC中。从而实现执行完function后下一条接着caller执行。

AAPCS （ARM Architecture Procedure Call Standard）

Basic rules：M架构过程调用标准

R0 - R3: scratch registers are not expected to be preserved upon returning from a called subroutine r0 - r3

R0～R3、R12、LR以及PSR被称作“调用者保存寄存器”,若在函数调用后还需要使用这些寄存器的数值，在进行调用前，调用子程序的程序代码需要将这些寄存器的内容保存到内存中(如栈)。函数调用后不需要使用的寄存器数值则不用保存。

R4 – R8, R10-R11: Preserved (“variable”) registers are expected to have their original values upon returning from a called subroutine

函数需要使用R4～R11,就应该将这些寄存器保存到栈空间中，并且在函数结束前将它们恢复。R4～R11为“被调用者保存寄存器”,被调用的子程序或函数需要确保这些寄存器在函数结束时不会发生变化(与进入函数时的数值一样)。这些寄存器的数值可能会在函数执行过程中变化，不过需要在函数退出前将它们恢复为初始值。

Return Values

**一般来说，函数调用将R0～R3作为输入参数，RO则用作返回结果。**若返回值为64位，R1也会用于返回结果

Function Arguments

如何调用函数：With branch link (BL) or branch link and exchange instruction (BLX)

传参数：寄存器（R0-R4）最快，但是有数量限制，用stack

• Process arguments in order they appear in source code

后面的参数的寄存器必须比前面的编号大

• Round size up to be a multiple of 4 bytes 必须是4B的倍数

• Copy arguments into core registers (r0-r3), aligning doubles to even registers 8B要存两个寄存器

• Copy remaining arguments onto stack, aligning doubles to even addresses

example：

func(A(4B),B(8B),C(4B),D(4B))

arg1在r0里，arg2在r1里

因为要进入func2 , r1要保存4,所以得把arg2从r1取出来保存到r4

这段代码有两个问题：LR在FUNC2（）时改变在func1()结束时回不去main()

R4是reserved register所以要在函数结束时恢复原来的值。

改后的func1()

bubble_asm:
    push {lr}                  @ 保存 lr 寄存器
    sub sp, sp, #12            @ 为局部变量分配空间
    str r0, [sp, #8]           @ 将参数 n 存入内存
    str r1, [sp, #4]           @ 将参数 a 存入内存

    mov r2, #0                 @ 初始化循环计数器 j
outer_loop:
    ldr r3, [sp, #8]           @ 加载 n
    subs r3, r3, #1            @ i = n - 1
    cmp r3, #0
    ble end_outer_loop         @ 跳出循环
    mov r0, r3                 @ 将 i 存入 r0

inner_loop:
    mov r1, r2                 @ 将 j 存入 r1
    add r4, r2, #1             @ r4 = j + 1
    lsl r5, r4, #2             @ r5 = 4 * (j + 1)
    ldr r6, [sp, #4]           @ 加载数组 a 的地址
    add r6, r6, r1, lsl #2     @ a + 4 * j
    ldr r7, [sp, #4]           @ 加载数组 a 的地址
    add r7, r7, r5, lsl #2     @ a + 4 * (j + 1)
    ldr r8, [r6]               @ 加载 a[j]
    ldr r9, [r7]               @ 加载 a[j+1]
    cmp r8, r9
    ble next_iteration

    str r9, [r6]               @ 交换 a[j] 和 a[j+1]
    str r8, [r7]
next_iteration:
    add r1, r1, #1             @ j = j + 1
    cmp r1, r0
    bne inner_loop             @ 继续内循环

    add r2, r2, #1             @ i = i - 1
    b outer_loop               @ 继续外循环

end_outer_loop:
    add sp, sp, #12            @ 释放局部变量空间
    pop {pc}                   @ 恢复 lr 寄存器并返回

开始都要PUSH｛LR｝结束都要POP｛PC｝

Memory requirements and Accessing data in memories

What Memory Does a Program Need

Five possible types

• Code

• Read-only static data

• Writable static data

• Initialized

• Zero-initialized

• Uninitialized

• Heap

• Stack

分类标准

Can the information change?

No? Put it in read- only, non-volatile memory

• Instructions

• Constant strings

• Constant operands

• Initialisation values

Yes? Put it in read/write memory

• Variables

• Intermediate computations

• Return address

How long does the data need to exist?

Statically allocated （静态分配）

• Exists from program start to end

• Each variable has its own fixed location

• Space is not reused
Automatically allocated

• Exists from function start to end

• Space can be reused
Dynamically allocated

• Exists from explicit allocation to explicit deallocation（从显式分配到显式回收都存在）

• Space can be reused

const: 不能被程序重写，放入ROM

volatile: 只能被外部程序重写，如Interrupt Service Rouine, hardware-controlled registers

static: 可以被function间相互调用

C Run-Time Start-Up Module

RAM是运行时分配的，ROM是固化的

• Initialize hardware

• Peripherals, etc.

• Set up stack pointer

• Initialize C or C++ run-time environment

• Set up heap memory

• Initialize variables

Activation Record

用栈来保存各个活动的数据

activation record 包含了：

return address
arguments
automatic variables (local variables)

调用一个function会在栈中创建一条新的 activation record

function return了会在栈中删除一条activation record

Automatic Variables Stored on Stack

Variables in C are implicitly automatic.

Automatic variables are stored in a function’s activation record（栈帧形式） (unless optimised and promoted to register)

Activation records are located on the stack.

Calling a function creates an activation record, allocating space on stack.
Returning from a function deletes the activation record, freeing up space on stack.

Program must allocate space on stack for variables

Stack addressing uses an offset from the stack pointer:

LRD Rm, [SP, #offset]

Data on the stack are always word aligned（对齐，比如存0x01,必须占满四个字节：内存中其实是01 00 00 00）

• In the instruction encoding, one byte used for offset, which is multiplied by four

• Possible offsets: 0, 4, 8, …, 1020 (255 x 4)

• Maximum range addressable this way is 1024 bytes

如何访问automate vairable:

当函数被调用时，堆栈指针（SP）通常会调整，以为函数的自动变量分配空间。
从堆栈指针到变量c的偏移量是根据变量在堆栈帧中的大小和顺序确定的。
要访问变量c的值，汇编代码可以使用从堆栈指针（SP）计算得出的适当偏移量，并将其加载到寄存器中进行进一步处理。
对变量c的任何修改或比较可以使用保存其值的寄存器进行。

Array Access

not be covered in final exam