互斥锁与自旋锁

负有成见

2023-10-09

Golang, Pkg

在现实生活中，锁的作用无非是将物品关起来，防止被他人进入等。而在程序中锁的概念也是如此，当同一个内存地址被同时访问时，会造成数据竞争，意思是不知道当前变量被谁所操作，所以引入了锁的概念，当有人来访问时，对此变量加上一把锁来防止其他对象对其访问，这样才会更安全的使用变量做操作。现实中锁的种类有多种，比如物理锁、指纹锁等。在程序中亦是如此，这里大类氛围两种，互斥锁和自旋锁。

1. 数据竞争(Data Race)^3

Data Race 又称 Race Condition。它旨在描述一个系统或者进程的输出依赖于不受控制的事件出现顺序或者出现时机。此词源自于两个信号试着彼此竞争，来影响谁先输出。

计算机为了提高性能，出现了多线程和多进程，增加了执行并行能力，但是在保持数据一致上就需要外力来维持。而造成这一原因的就是数据竞争，多个执行者对一个事物的操作。最为常见的例子是在银行存取款，如果付款存在多个交易时较为容易出现这类问题。

事例

如果你银行卡中有余额 1000 元，当你同时购买两个售价分别是 100 和 200 元的商品时，在银行账户中的操作可能是这样的

1000 -100 = 900 // 购买 100 元商品时扣款
900 - 200 = 700 // 购买成功第一个商品后，扣款 200 元商品
1000 - 200 = 800 // 购买 200 元善品时扣款

余额可能是这三种，因为在同时付款的时候，你不知道哪个扣款是先处理操作。所以会出现上述情况，按理说付款完成后余额剩余 700 才是正确的，但是为什么会出现这种情况，是因为出现了数据竞争，但进行扣款操作时候，两者都拿到了总额为 1000 的进行扣除，其中有一方肯定错误的扣除处理。

# 使用Golang实现
package main

import (
	"fmt"
	"sync"
)

// 余额
var Balance uint = 1000

// wg
var wg sync.WaitGroup

// 消费行为
func consume(money uint) uint {
	defer wg.Done()
	Balance -= money
	return Balance
}

func do() {
	defer wg.Wait()
	goods := [2]uint{100, 200}
	// 同时有两款交易进行中
	for i := range goods {
		wg.Add(1)
		go func(i int) {
			fmt.Printf("Current balance is: %v\n", consume(goods[i]))
		}(i)
	}
}

运行函数do(),得到输出为：

1 2	Current balance is: 800 Current balance is: 900

由此可得出这个结果是错误的，那么如何保证结果的正确性呢，那就对余额加上一把锁。

2. 互斥锁(Mutex)

**互斥锁^1**（英语：Mutual exclusion，缩写 Mutex）是一种用于多线程编程中，防止两条线程同时对同一公共资源（比如全局变量）进行读写的机制。该目的通过将代码切片成一个一个的临界区域（critical section）达成。临界区域指的是一块对公共资源进行访问的代码，并非一种机制或是算法。一个程序、进程、线程可以拥有多个临界区域，但是并不一定会应用互斥锁。

互斥锁是在程序中最为常见的锁机制，当对变量加锁时，便会禁止其他对当前变量的访问。当程序还是单线程，是没有任何锁的身影出现。主要原因是用不到，毕竟每次只有一个对其使用，自然不会存在竞争问题。但在 CPU 性能越发强大的现在，只是单一线程是无法满足我们的性能要求，所以出现了多线程这一技术，也就是常说的并行操作。当涉及到并行时，无法避免的就是多个线程对某一个变量同时读写的问题。如果出现了多个线程对同一变量的访问，那么会出现竞态（race）问题。

对上文表述的代码中余额加上互斥锁，而在Go语言中是sync.Mutex。

源码中对Mutex定义如下

// A Mutex is a mutual exclusion lock.
// The zero value for a Mutex is an unlocked mutex.
//
// A Mutex must not be copied after first use.
//
// In the terminology of the Go memory model,
// the n'th call to Unlock “synchronizes before” the m'th call to Lock
// for any n < m.
// A successful call to TryLock is equivalent to a call to Lock.
// A failed call to TryLock does not establish any “synchronizes before”
// relation at all.
type Mutex struct {
	state int32
	sema  uint32
}

其结构体方法为：

1
2
3

func (m *Mutex) Lock() // 加锁
func (m *Mutex) TryLock() bool // 尝试加锁，并返回是否状态
func (*Mutex) Unlock // 解锁

将这一锁加在上述银行例子中：

// 定于全局mutex
var mu = sync.Mutex

// 在consume方法中添加
mu.Lock() // 对Balance加锁
defer mu.Unlock() // defer下Balance操作解锁

互斥锁对于我们还是比较容易理解的，虽然方便使用，但是有一些弊端存在

如果加锁后执行时间长，性能会极具下降
如果没有正确处理锁的获取与释放，会出现死锁问题，影响程度如上

2.1 读写锁

读写锁(RWMutex)，在多数资源争夺操作中，多数分为两种，写操作和写操作。但在实际开发中会有读多写少或者读少写多的情况出像，所以在 sync 包中将两种粒度分离开来。

源码定义如下:

// A RWMutex is a reader/writer mutual exclusion lock.
// The lock can be held by an arbitrary number of readers or a single writer.
// The zero value for a RWMutex is an unlocked mutex.
//
// A RWMutex must not be copied after first use.
//
// If any goroutine calls Lock while the lock is already held by
// one or more readers, concurrent calls to RLock will block until
// the writer has acquired (and released) the lock, to ensure that
// the lock eventually becomes available to the writer.
// Note that this prohibits recursive read-locking.
//
// In the terminology of the Go memory model,
// the n'th call to Unlock “synchronizes before” the m'th call to Lock
// for any n < m, just as for Mutex.
// For any call to RLock, there exists an n such that
// the n'th call to Unlock “synchronizes before” that call to RLock,
// and the corresponding call to RUnlock “synchronizes before”
// the n+1'th call to Lock.
type RWMutex struct {
	w           Mutex        // held if there are pending writers
	writerSem   uint32       // semaphore for writers to wait for completing readers
	readerSem   uint32       // semaphore for readers to wait for completing writers
	readerCount atomic.Int32 // number of pending readers
	readerWait  atomic.Int32 // number of departing readers
}

结构体方法：

func (rw *RWMutex) Lock() // 添加写锁
func (rw *RWMutex) Unlock() // 释放写锁
func (rw *RWMutex) RLock() // 添加读锁
func (rw *RWMutex) RUnlock() // 释放读锁
func (rw *RWMutex) RLocker() Locker  // 获取实现了Locker接口方法
func (rw *RWMutex) TryLock() bool // 加写锁，并返回添加结果
func (rw *RWMutex) TryRLock() bool // 加读锁，同上

在使用上也同 Mutex 相同，只不过将读和写分离开来，在Rlock方法源码中有这样一句注释：

A writer is pending, wait for it.

也就是说读写锁分离中最关键一点是读锁在进行加锁时候，会等待写锁执行后再进行添加读锁。实际例子如下所示：

package main

import (
	"log"
	"sync"
	"time"
)

var rxmu sync.RWMutex
var global = 10 // 定义测试变量

// 读操作
func read() {
	// 添加读锁
	rxmu.RLock()
	// 释放读锁
	defer rxmu.RUnlock()
	log.Printf("read global is: %d", global)
}

// 写操作
func write() {
	defer log.Println("write done")
	// 添加写锁
	rxmu.Lock()
	// 释放写锁
	defer rxmu.Unlock()
	// 写延时操作，两秒后
	time.Sleep(time.Second * 2)
	global -= 5 // global自减5
}

在执行过程中，优先执行 write 操作，让函数保持写锁，那么在 read 时候，会保持等待状态，当 rxmu 的写锁释放，则 read 的读锁获取并进行相应后续操作。测试代码如下所示

package main

import (
	"sync"
	"testing"
	"time"
)

func TestRxMutex(t *testing.T) {
	t.Log("start process...")
	var wg sync.WaitGroup
	defer wg.Wait()
	go func() {
		wg.Add(1)
		defer wg.Done()
		write()
	}()
	// 暂停程序，先执行写操作
	time.Sleep(time.Second * 1)
	go func() {
		wg.Add(1)
		defer wg.Done()
		read()
	}()
}

输出如下：

1 2	2023/10/15 15:38:11 write done 2023/10/15 15:38:11 read global is: 5

然而这样在并发量较大时候，如果长时间保持锁的持有，那么对性能会有很大的影响。那么有没有一种优势更好的锁呢？自旋锁，顾名思义自旋锁就是一直自旋，那么通常人们都会认为保持自旋会占用 CPU 性能，我开始也是这样认为，相当于在带代码中写了for{}的死循环，这样程序是一直占有系统的时间片，而无法执行其他操作。而语言开发者和系统设计者并不会有像我这么傻的想法，当然会避免这一问题的出现。

3. 自旋锁(Spinlock)

**自旋锁^2**是计算机科学用于多线程同步的一种锁，线程反复检查锁变量是否可用。由于线程在这一过程中保持执行，因此是一种忙等待。一旦获取了自旋锁，线程会一直保持该锁，直至显式释放自旋锁。

在 Go 语言中，如果涉及内容较深，便会经常看到一个词汇”空转“或是”自旋“。从字面上理解就是不停的运行，如果使用代码解释则是

1
2
3

for {
  // do something
}

系统便会不厌其烦执行其中的逻辑。如果在循环中没有其他逻辑操作，则会保持任何输出为空一遍又一遍的运行。这无疑是对性能或资源的一大浪费，绝不能允许这样的事情发生，在这边引入了 CAS（Compare And Swap）。

3.1 CAS^4

an atomic instruction used in multithreading to achieve synchronization. It compares the contents of a memory location with a given value and, only if they are the same, modifies the contents of that memory location to a new given value.

在原子操作中，CAS 的出场率比较高，本意是比较并交换，返回此操作是否成功。而在操作指令中，是将其合并成一条指令执行。用户态在执行加解锁的过程中不会产生主动行为的上下文切换，所以使用CAS比互斥锁的效率更高。如果将 CAS 拆分出来单独解释，在 Golang 文档中是这样描述：

Swap

// The swap operation, implemented by the SwapT functions, is the atomic
// equivalent of:
//
//	old = *addr
//	*addr = new
//	return old

And

// The add operation, implemented by the AddT functions, is the atomic
// equivalent of:
//
//	*addr += delta
//	return *addr
//

Compare

这里集合 Swap 和 And 解释更为合适，单独解释则为比较，符号表示为 =

Compare and swap

// The compare-and-swap operation, implemented by the CompareAndSwapT
// functions, is the atomic equivalent of:
//
//	if *addr == old {
//		*addr = new
//		return true
//	}
//	return false
//

3.2 Spin

旋转（spin）。在Go语言中使用更低级语言实现，其中抛出了两个方法

1 2	func runtime_procPin() int // 开始自旋 func runtime_procUnpin() // 停止自旋

对其注释是这样解释：Disable/enable preemption, implemented in runtime.

其在 Runtime 中采用汇编语言实现，加载在运行中执行。然而自旋并不是无休止的运行闭包内逻辑，而是 CPU 提供了 PAUSE 指令，这个指令会让 CPU 休息一定的时间周期，所以在自旋的过程中并不会有过多的性能损耗或是占着 CPU 不释放，从而更省电，抢占锁的效率更高。

PAUSE 指令描述：

Improves the performance of spin-wait loops. When executing a “spin-wait loop,” processors will suffer a severe performance penalty when exiting the loop because it detects a possible memory order violation. The PAUSE instruction provides a hint to the processor that the code sequence is a spin-wait loop. The processor uses this hint to avoid the memory order violation in most situations, which greatly improves processor performance. For this reason, it is recommended that a PAUSE instruction be placed in all spin-wait loops.

An additional function of the PAUSE instruction is to reduce the power consumed by a processor while executing a spin loop. A processor can execute a spin-wait loop extremely quickly, causing the processor to consume a lot of power while it waits for the resource it is spinning on to become available. Inserting a pause instruction in a spin-wait loop greatly reduces the processor’s power consumption.

This instruction was introduced in the Pentium 4 processors, but is backward compatible with all IA-32 processors. In earlier IA-32 processors, the PAUSE instruction operates like a NOP instruction. The Pentium 4 and Intel Xeon processors implement the PAUSE instruction as a delay. The delay is finite and can be zero for some processors. This instruction does not change the architectural state of the processor (that is, it performs essentially a delaying no-op operation).

This instruction’s operation is the same in non-64-bit modes and 64-bit mode.

其中最重要的两点为

提高自旋的性能
抢占锁效率更高

3.3 Usage

说了一堆概念性东西，那么该如何使用呢？在Golang中定义在标准包sync/atomic中，其中有句话很明显

Package atomic provides low-level atomic memory primitives useful for implementing synchronization algorithms.

atomic 包实现了低级的内存原语，查看源代码可知部分结构体方法是没有逻辑实现的，因为涉及到低级的内存原语，所以 Golang 采用汇编语言实现了这些只定义函数的方法。

所以说该如何使用 Atomic 特性在项目中呢？还是根据上述例子将互斥锁变更为自旋锁。代码如下

package main

import (
	"log"
	"sync"
	"sync/atomic"
)

var balanceAtomic atomic.Int64

func init() {
	// 初始化余额为1000
	balanceAtomic.Store(1000)
}

func do2() {
	var wg sync.WaitGroup
	work := func(money int64, wg *sync.WaitGroup) {
		defer wg.Done()
		for {
			balance := balanceAtomic.Load()
			if balanceAtomic.CompareAndSwap(balance, balance-money) {
				break
			} else {
				log.Printf("CompareAndSwap ISN'T PASS")
			}
		}
	}
	for worker := 0; worker < 4; worker++ {
		wg.Add(1)
		go work(100, &wg)
	}
	wg.Wait()
	log.Printf("Rest of money is %d\n", balanceAtomic.Load())
}

开启四个 Goroutine 对同一个变量操作，使用CAS来保证其原子性行为。上述还只是基础操作，更多的可根据实际情况进行演变，举一反三。

4. 结语

Go 语言的一个设计理念，这一思想在 Rust 中也有相应的应用，原理相同：Channel

Share memory by communicating;
don't communicate by sharing memory.

通过通信来共享变量；
不要通过共享内存进行通信。

在并发操作中，对性能影响影响较低便是如此。使用通信来共享变量，由此引出 Golang 中的 Channel，这要在以后进行分析，因为它不在属于锁的范畴内。相反，在实际项目中，互斥锁和自旋锁使用居多，简单易用，并且在 CPU 性能与并行能力过剩的今天更是如此。

当所选型的应用并未有对影响程序有较大危害时，不要过早优化。

如果表述有误，劳烦大佬不要吝啬的指教于我！