Posted in Operating System

Deadlock

Monday, September 22, 2014

Comment

데드락 그 자체보다는, 데드락을 어떻게 해결하는가가 더 중요한 것 같다.

Deadlock Necessary Conditions

Deadlock 은 Coffman conditions 으로 알려진 아래 4개의 조건들이 모두 동시에 일어날때 발생한다. 만약 조건 중 하나라도 막을 수 있다면, Deadlock 이 발생하지 않는다.

Mutual Exclusion: at least one resource must be held in a non-shareable mode. Only one process can use the resource at any given instant of time
Resource Holding: a process is currently holding at least one resource and requesting additional resources which are held by other processes
No Preemption: a resource can be released only voluntarily by the process holding it, after that process has completed its task
Circular Wait: a process must be waiting for a resource which is being held by another process, which in turn is waiting for the first process to release the resource.

위키피디아 에 의하면 최근의 운영체제 시스템도 데드락이 발생하는걸 예방할수는 없고, 발생한다 하더라도 서로 다른 해결 방식을 취한다고 한다. 대부분의 경우 데드락을 예방하기 위해서 Circular Wait 조건을 방지하려고 하는데, 대표적인 데드락 방지 방법은 다음의 4가지다.

Ignoring Deadlock
Detection and Recovery
Prevention
Avoidance

Deadlock Handling

1. Ignoring Deadlock

가장 마음 편한 방법이다. Deadlock 이 발생하지 않으리라 생각하며, 발생해도 무시한다. Ostrich Algorithm 이라고도 알려져 있는데, 쉽게 말해서 Convenience 와 Correctness 를 교환하는 방법이다. 데이터 손실이나, 데드락 발생 간격이 허용될만큼 충분히 작다면, 이를 해결하기 위해 궂이 성능을 낮출 필요가 없다는 것이다. poor worst-case performance 알고리즘의 대표적인 예로 Standard ML 의 type-checking 알고리즘이 있다고 한다.

2. Detection

Deadlock이 발생할 수 있다고 보고, 감지한 뒤 해결하는 방법이다. Deadlock을 감지하기 위해서 recourse allocation 과 process states 를 추적 해야 한다. finite state-model 등을 이용해서 termination 조건을 결정하는 방법을 사용한다고 위키에 나와있다. (오토마타는 컴파일러뿐 아니라 여러 분야에서 중요한 것 같다.)

(1). Deadlock 을 발견한 뒤에는 Process Termiantion 을 통해 해결할 수 있다. 그러나 단순히 모든 프로세스를 종료 하게 되면, 진행중이던 computation 과 같은 리소스 들이 낭비되기 때문에 어떤 프로세스를 종료해야 가장 리소스 낭비가 적을지 결정해야 하는데, 이 부분이 쉽지 않다. 해당 프로세스를 종료 했을 때 Deadlock 이 풀릴지 아닐지를 계산하고, priority 와 process age 를 고려해야 하기 때문에 복잡하다.

(2). 다른 대안으로 Deadlock이 풀릴 때 까지 Resource Preemption 을 사용할 수 도 있다.

3. Prevention

Coffman conditions 중 하나를 방지하면 된다. (참 쉽죠?)

(1). Mutual exclusion 조건을 제거한다는건, 어느 프로세스도 resource 에 대해 배타적인 접근 권한을 가지지 못한다는 건데, 놀랍게도 spooled 될 수 없는 리소스에 대해서는 이러한 방법을 사용하지 못한다. 게다가 spooled resources 라 하더라도 Deadlock 은 여전히 발생할 수 있다. Non-blocking synchronization 알고리즘이 mutual exclusion 을 피할 수 있는 방법인데, 이 알고리즘은 스레드의 execution 이 indifinitely postponed 되는걸 막는다. 여기 에 의하면, 정해진 시간당 최소한 하나의 호출이 완료될 때 Lock-free 하다고 볼 수 있으며 충돌과는 관계 없이 모든 호출이 완료되면 Wait-free 하다고 한다. 아래는 위키를 인용

In computer science, a non-blocking algorithm ensures that threads competing for a shared resource do not have their execution indefinitely postponed by mutual exclusion. A non-blocking algorithm is lock-free if there is guaranteed system-wide progress regardless of scheduling; wait-free if there is also guaranteed per-thread progress.

(2). Resource holding 조건의 경우에는 프로세스에게 시작 전에 필요한 모든 리소스를 요청하도록 함으로써 해결할 수 있는데, 말이 쉽지 시작 전에 사용할 모든 리소스를 파악하는 것도 어려우며, 모든 리소스를 사용할거라는 이유만으로 미리 할당하는 것은 비효율적이다.

다른 방법으로는 프로세스가 아무런 리소스를 가지고 있지 않을때만 리소스를 요청하도록 하는 방법이 있다. 따라서 프로세스들은 처음에 가지고 있는 모든 리소스를 release 해야만 이후에 필요한 리소스를 요청할 수 있다. 이 방법 또한 리소스가 덜 사용되므로 비효율적이다. 또한 자주 사용되는 리소스를 사용하는 프로세스는 무기한 대기할 수 있어 기아상태를 낳을 수 있다. Serializing token 이 All-or-none 알고리즘으로 알려져 있다.

Serializing token 와 Mutexex 는 shared resource 를 동시에 접근하는걸 막는다는 점에서 비슷하지만, Serializing tokens 은 해당 스레드가 blocked 혹은 asleep 상태일때는 다른 스레드의 shared resource 에 대한 접근을 막지 못한다. 따라서 Timeslicing, Preemption, Concurrent Execution 에 대해서는 Mutexes 처럼 동작하지만 Voluntary Blocking 에 대해서는 다른 스레드의 리소스 접근을 허용할 수 있다.

그럼에도 불구하고, Mutexes 가 Deadlock 이나 Priority Inversion 등 해결하기 어려운 문제를 발생시킬 수 있기 때문에, Deadlock 이 발생하지 않고, 코드를 간결하게 만드는 Serializing Tokens 을 이용하기도 한다. 아래는 위키 인용

Serializing tokens allow programmers to write multiprocessor-safe code without themselves or the lower level subsystems needing to be aware of every single entity that may also be holding the same token.

In fact, the fact that tokens do not deadlock coupled with the fact that there is no expectation of atomicity for earlier acquired tokens when later operations block leads to a great deal of code simplification. If you look at FreeBSD-5, you will notice that FreeBSD-5 passes held mutexes down the subroutine stack quite often, in order to allow some very deep procedural level to temporarily release a mutex in order to switch or block or deal with a deadlock. There is a great deal of code pollution in FreeBSD-5 because of this (where some procedures must be given knowledge of the mutexes held by other unrelated procedures in order to function properly).

(3). 어떤 프로세스든 최소 일정시간동안은 리소스를 보유해야 하기 때문에 No preempition 조건을 막기는 어렵거나 불가능할 수 있다. 게다가 리소스가 대해 Preemption System 의 경우에는 inconsistent processing outcome 이나, thrashing 이 발생할 수 있다.

일반적으로 잠긴(locked out) 리소스에 대한 preemption 은 rollback 을 의미하는데, 이건 오버헤드가 무지 크므로 주의해야 한다. preemption 을 허용하는 알고리즘에는 lock-free and wait-free algorithms 과 optimistic concurrency control 이 있다.

(4). Circular Wait 을 방지하는 방법은, 크리티컬 섹션에서 인터럽트를 불가능하게 만들고 resource 에 대해 order 를 만들어 두어, 순서대로 얻어야만 상위 레벨의 resource 를 얻도록 하는 방법이 있다. 식사하는 철학자 문제에서 다익스트라가 제안한 방법이기도 하다.

그러나 이 방법은 하위 레벨의 리소스를 점유하기 위해서는, 상위 레벨의 리소스를 release 해야 하기 때문에 오버헤드가 생긴다. 특히 대용량 데이터베이스에 접근하는 어플리케이션의 경우, low-level 레코드를 얻기 위해 high-level 레코드를 방출하는 경우가 발생할 수 있는데, 극심한 비효율을 야기할 것이다.

4. Aviodance

리소스에 대해 특정 정보를 알 수 있는 경우, 리소스를 할땅 할 때마다 시스템이 Deadlock 을 야기할 수 있는 unsafe 한 상태로 넘어가는지 검사하여 Deadlock 을 피할 수 있다. 이를 위해서는 최소한 다음의 조건들을 알아야 한다.

resources currently available
resources currently allocated to each process
resources that will be required and release by these processes in the future

대표적인 알고리즘이 Banker's algorithm 이다. 새로운 프로세스가 시스템에 등록될때는, 어떤 리소스가 요구될지 미리 기록해야 하며 이 리소스들은 시스템이 가진 리소스를 초과하지 않아야 한다. 리소스가 할당 될때도, 시스템이 가진 총량을 넘을 수 없으며 사용 가능한 한도 내에 있어야 리소스가 프로세스에게 할당이 된다. 그러나 프로세스가 얼마만큼의 리소스가 필요할지 사전에 정확하게 예측하는건 어려운 일이다.

Wait/Die 또는 Wound/Wait 이라 불리는 알고리즘도 있다. 타임스탬프를 이용해서 어느 프로세스가 오래되었는지 판별한 뒤에 다음과 같은 알고리즘을 적용할 수 있다. 알파벳 O 는 오래된 프로세스, Y 는 신규 프로세스다

Wait/Die
- O needs a resource held by Y : O Waits
- Y needs a resource held by O : Y dies
Wound/Die
- O needs a resource held by Y : Y dies
- Y needs a resource held by O : Y waits

Old Lisper

Lisp, Emacs, Scala