Posted in coursera, programming language, ml, emacs

Programming Language, Week1

Friday, October 10, 2014

Comment

Programming Language by Dan Grossman, Coursera

Coursera PL 클래스인데 과제 마감기한도 까다롭고, 동료평가도 있고, 여러모로 조금 빡세다. 유일한 낙은 언어의 다양한 특징들을 탐구하기 위해 ML 을 사용하고 ~~오오 갓 ML~~ , emacs 를 사용한다는 건데.. 잘 버틸수 있을까 의심스럽다. 무려 첫 강의부터 대략 300분이 넘는 동영상을 올려주시는 교수님 -_-;

미국 CS 전공자들은 모두 이렇게 빡세게 배우는가요 ㅠㅠ?

SML/nj 와 Emacs 에 sml-mode 설치 후 시작한다. c-c, c-s 는 REPL 을 켠다.

ML Variable Bindings and Expressions

(* This is a comment. *)

val x = 34;  
(*: static env: x : int *)
(*: dynamic env: x --> 34 *)
val y = 35 : int;  
val z = (x + y) + (y + 2);

각 라인마다, dynamic environment 에 바인딩을 하나씩 추가한다. 따라서 세번 째 라인에서 x 와 y 를 z 를 바인딩하기 위해 사용할 수 있다.

그리고, dynamic env 에 바인딩을 추가하기 전에 static enviornment 에 Type 을 추가한다. 맨 처음엔 x: int 가 추가되고 y 와 값이 바인딩 되기 전에는 x: int, y: int 가 static env 에 추가된다.

따라서 매 라인마다 type checking 이 먼저 일어나고 그 후에야 프로그램이 evaluated (excuted) 된다.

ML 에서 if 는 다음과 같이 작성할 수 있다. 다르 언어와 비슷하다.

val abs_of_z = if z < 0 then 0 - z else z;

여기서 잠깐 Syntax 와 Semantics 를 정리하자면

Syntax is just how you write something

Semantics is what that something means
- Type-checking (before program runs)
- Evaluation (as program runs)

variable binding 에서는, type checking 은 static environment 를 확장하고, evaluation 은 dynamic environment 를 확장한다.

Rules for Expressions

expression 은 sub-expression 을 가질 수 있기 때문에, expression 을 해석 하는데 있어서 3가지가 꼭 필요하다.

(1) Syntax
(2) Type-checking rules: produces a type of fails
(2) Evaluation rules: produces a value

다시 말하면 Syntax 와 Semantics 가 필요하단 이야기다.

Variables

Syntax: sequence of letters, digits, _, not starting with digit

Type-checking: look up type in current statix env if non there, fail

Evaluation: look up value in current dynamic env

Addition

Addition 의 경우에는 sub-exp 가 있을 수 있다.

Syntax: e1 + e2 where e1 and e2 are expressions

Type-checking: if e1 and e2 have type int then int

Evaluation: if e1 evaluates to v1 and e2 evaluates to v2, then sum of v1 and v2

Values

모든 value 는 expression 이다. 그러나 모든 expression 이 value 인 것은 아니다. 그리고,

Every value evaluates to itself in zero steps

참고로, () 는 unit 타입을 가진다.

Conditional

Syntax: if e1 then e2 else e3 where if, then, and else are keywords and e1 e2 and e3 are sub-expressions

Type-checking: e1 must have type bool. e2 and e3 can have any type, but they must have the same type

Evaluation: first evaluate e1 to v1, if it's true evaluate e2 and that resut is the whole expressions's result else evaluate e3

REPL and Errors

use

use 는 특정 파일을 읽어 binding 하고 it: unit 을 돌려주는데, 이건 무시해도 된다. 그리고 같은 파일을 use 할때는 항상 REPL 을 다시 시작하자. C-d, C-c, C-s

Error

대부분의 에러는 syntax, type-checking, evaluation 의 문제다.

Shadowing

같은 변수에 대한 multiple binding 은 poor style 이다. 그러나 이를 통해 environment 와 binding 이 어떻게 동작하는지 알 수 있다.

val a = 10;  
val b = a * 2  
val a = 5; (* this is not an assignment statement *)  
(* a -> 5, b -> 20 *)
val c = 2;  
(* a -> 5, b -> 20, c -> 20 *)

val a = 5 문장은, 할당하는게 아니라 a 를 shadowing 한다. ML 에서는 mutate 할 수 없다. 매번 새롭게 dynamic env 를 만든다.

val d = a  
(* ..., d -> 5 *)
val a =  a + 1  
(* ..., a -> 6 *)
val f = a * 2

use 를 이용하면, 기존의 a 의 값이 <hidden-value> 로 나오는 것을 확인 할수 있다.

val a = 1;  
val b = a;  
val a = 2;

다음과 같은 예제가 있을 때, b 는 1 이다. eagerly evaluated 되어 바인딩 후에는 value 를 만든 expression 과는 관련이 없어진다. 다시 말해 바인딩 후에는 a 와 b는 상관이 없다.

그리고 위에서 언급 했듯이 ML 에서는 assign to 가 없고, 앞의 a 는 뒤의 a 에가 있는 dynamic env 에 의해서 가려질 뿐이다.

그렇기 때문에 REPL 을 재시작 하지 않고서 같은 파일을 여러번 use 하면 문제가 생길 수 있다고 교수님이 누차 말한 것

Functions Informally

fun pow (x: int, y: int) =  
  if y = 0
  then 1
  else x * pow(x, y-1)

fun cube(x: int)  
  pow(x, 3)

이 경우 두 함수 모두 타입은 fn: int -> int 다. 타입을 이름 뒤에 사용하는것도 그렇고, 함수 타입도 그렇고 스칼라와 문법이 비슷한 것 같다.

* 가 타입에 있을때는 곱셈이 아니라 , 같은 역할을 한다. 따라서 다음과 같이 pow 를 호출할 수 있다.

val x : int * int = (2, 3)  
val y = pow x

참고로, 함수를 사용한 후 정의하는 것은 불가능하다. 따라서 사용하는 expression 위에 함수를 정의해야 한다.

Recursion

재귀에 대해서도 간단히 언급을 하는데, 문제를 간단한 방법으로 나누어 푸는 좋은 기술이라고..

Functions Formally

우리가 Function 이 무엇인지 PL 에서 정의하려면 위에서 언급했듯이 syntax 와 semantics 가 필요하다.

Syntax: fun x0 (x1: t1, ... , xn: tn) = e

Evaluation: A function is a value. x0 is added to dynamic env

Type-checking: (t1 *, ..., * tn) -> t

타입체킹이 조금 복잡한데, e 가 t 타입을 가지는지 검사하고, 파라미터도 마찬가지로 올바른 타입을 가지는지 검사한다.

다른 언어와 마찬가지로 t1 등의 파라미터는 e 를 위한 environment 에만 추가된다.

x0 이 dynamic, static env 에 추가되므로 이후의 코드에서 recursion 을 사용할 수 있다.

Function calls

Function calls 의 syntax 는 e0 (e1, ..., en) 이다. 만약에 인자가 하나라면 괄호(parentheses) 는 없어도 된다.

참고로 ML 에서는 variable numbers of arguments 를 함수에서 받을 수 없다. 인자의 개수가 정해져야 한다.

type-checking 의 경우에는 , e0 이 (t1 * ... * tn) -> t 인지 검사하고 en 이 tn 타입을 가지면, e0 은 t 타입이다.

Evaluation 스텝은 다음과 같다.

(1) evaluate e0 to fun x0(x1: t1, ... , x: tn) = e
(2) evaluate arguments e1, ... , en to v1, ..., vn
(3) extend dynamic env mapping x1 to v1 , ... , xn to vn

두 번째 스텝에서는 eager evaluation 이 사용되는데 pow(2, 2+2) 같은 경우 인자가 2, 4 가 된다.

세 번째 스텝에서는 dynamic environment 를 확장하는데, 현재 함수인 x0 과 인자들인 xn 을 포함하도록 한다. 따라서 recursion 이 가능하다.

사실은 스칼라의 그것과 같은데 교재에 나온 설명이 너무 함축적이어서 이해하기가 어렵다.

Pairs and Other Tuples

위에서 잠깐 보았던 t1 * t2 같은 것들이 Pair 다. 다른말로 2-tuples 라 부른다.

Syntax 는 (e1, e2) 로 Type-checking 은 e1 과 e2 가 올바른 타입을 가졌는지 검사한다.

Pair 에 접근할때는 #1 e 또는 #2 e 와 같은 Syntax 를 사용하고, e가 t1 * t2 타입인지, 그 후에 #1 e 가 t1 또는 #2 e 가 t2 타입을 가졌는지 검사한다.

fun su_two_pairs (pr1: int * int, pr2: int* int) =  
  (#1 pr1) + (#2 pr1) + (#1 pr2) + (#2 pr2)

의 경우에는 타입이 (int * int) * (int * int) -> int 된다. 그리고 다른 언어와 마찬가지로 tuple 도 겹칠 수 있다.

val x1 = (7, (true, 9)) // int * (bool * int )  
val x2 = #1 (#2 x1) // true

Introducing Lists

Tuple 은 여러 타입을 가질 수 있지만, 정해진 갯수만큼의 element 만 저장할 수 있다. 반면 List 는 하나의 타입만 가져야 하지만, 원소의 갯수가 변할 수 이다.

빈 리스트는 [] 와 같이 만든다. [3, 4, 5] 는 int list 다. [(1+2), 3, 7] 과 같이 초기화하면 [3, 3, 7] 이 나온다. 리스트는 그 자체로 value 다.

:: 는 cons 라 발음하고, 다음과 같이 쓸 수 있다.

val x = [3, 4, 5]  
val y = 2 :: x

cons 뒤에 오는것은 List 여야 한다. 리스트가 비었는지 검사하기 위해 null 을, head 를 얻기 위해 hd 를, tail 을 얻기 위해 tl 을 이용한다. 따라서 tl [3, 4, 5] 는 [4, 5] 를 돌려준다.

그리고 다른 함수형 언어와 마찬가지로 tl [9] 는 [](nil) 이다.

리스트는 다양한 타입을 가질 수 있기 때문에 (int * int) list 같은 것도 타입이 될 수 있다. [(3, 4), (5, 6)] 처럼.

null 은 fn: a list -> bool 타입이고, hd 는 fn: a list -> a, tl 은 fn: a list -> a list

[]: a list 는 좀 특이한데, 다양한 타입이 될 수 있다. 3 :: [], false :: [] 이라던지.

List Functions

리스트를 조작하는 간단한 함수를 ML 로 몇 개 짜보자.

fun pow(x: int, y: int) =  
    if y = 0 then 1 else x * pow(x, y - 1);

fun cube(x: int) =  
    pow(x, 3);

fun sum_list(xs: int list) =  
    if null xs
    then 0
    else hd xs + sum_list(tl xs)

fun product_list(xs: int list) =  
    if null xs
    then 1
    else hd xs * product_list(tl xs)

fun countdown(x: int) =  
    if x = 0
    then []
    else x :: countdown(x - 1)

fun append(xs: int list, ys: int list) =  
    if null xs
    then ys
    else (hd xs) :: append(tl xs, ys)

fun sum_pair_list(xs: (int * int) list) =  
    if null xs
    then 0
    else (#1 (hd xs)) + (#2 (hd xs)) + sum_pair_list(tl xs)

fun firsts(xs: (int * int) list) =  
    if null xs
    then []
    else #1 (hd xs) :: firsts(tl xs)

fun seconds(xs: (int * int) list) =  
    if null xs
    then []
    else #2 (hd xs) :: seconds(tl xs)

fun sum_pair_list(xs: (int * int) list) =  
    if null xs
    then 0
    else sum_list(firsts xs) + sum_list(seconds xs)

fun factorial(n : int) =  
    product_list(countdown(n))

참고로 # 이 + 이나 :: 보다 우선순위가 높다.

List Recursion

재귀에 대해 생각할땐, 항상 명심해야 하는게 있는데 탈출 조건 이다. 따라서 empty-list 에 대해선 어떤걸 돌려줄지, non-empty-list 에 대해서는 무엇을 처리해야 할지 항상 생각해야 한다.

Let Expressions

let 은 local variable 을 바인딩하는 법이다.

Syntax: let b1 b2 ... bn in e end. Each b1 is any binding and e is any expression

Type-checking: Type of whole let-expression is the type of e. Type-check each b1 and e in a staic env that includes the previous bindings

*Evaluation: * evaluate each b1 and e in a dynamic env that includes the previous bindings. Result of whole expression is result of evaluating e

fun silly () =  
    let
      val x = 3
    in
      (let val x = 2 in x + 1 end) + (let val y = x + 1 in y + 1 end)
    end

여기서 Scope 의 개념이 나온다.

Scope: Where a binding is in the environment

Nested Functions

Function 은 binding 이다. 따라서 let 내부에서 local binding 할 수 있다.

fun count_from_1 (x: int) =  
    let
    fun count (from: int, to: int) =
        if from == to
        then []
        else from :: count(from + 1, to)
    in
    count(1, x)
    end

이렇게 하면 top-level 에서 count 는 사라진다. 그리고 엄밀히 말해서 count 가 가진 environment 에는 to 가 있기 때문에, to 를 인자로 가질 필요가 없다.

fun count_from_1 (x: int) =  
    let
    fun count (from: int) =
        if from = x
        then []
        else from :: count(from + 1)
    in
    count(1)
    end

Let and Efficiency

가장 큰 숫자를 찾는 다음의 함수를 고려 해 보자

fun bad_max (xs : int list) =  
  if null xs
  then 0
  else if null (tl xs)
  then hd xs
  else if hd sx > bad_max(tl xs)
  then hd xs
  else bad_max(tl xs)

이 함수는 [1, 2, ... , 30] 과 같은 리스트에 굉장히 나쁜 성능 exponentially (2^30) 을 보여준다. bad_max(tl xs) 를 두번 호출하기 때문이다. 따라서 max(tl xs) 를 변수로 놓아 캐싱하면

fun good_max (xs: int list) =  
    if null xs then 0
    else if null (tl xs) then hd xs
    else
    let
        val res = good_max(tl xs)
    in
        if hd xs > res then hd xs
        else res

    end

bad_max 의 if-then-else 가 10^-7 정도의 시간이 든다고 하면, [1, 2, ..., 55] 는 100년이 넘게 걸린다. 따라서 재귀를 구현하는데 있어서 local binding 은 필수다.

Options

참고로, good_max 는 리스트가 모두 음수일때 0 을 돌려준다. 이건 리스트가 비었을때 0 을 돌려주기 때문에 생기는 문제인데, 0 말고 다른 무언갈 돌려줄 수 없을까?

SML 도 Scala 처럼 Option 을 지원한다.(물론 SML 이 먼저..) NONE 은 a option 타입이고, SOME e 는 t option 이다. t 는 e 의 타입.

Option 을 이용해 max 를 리팩토링 해 보면,

fun max1 (xs: int list) =  
  if null xs NONE
  else 
    let val res = max(tl xs)    
    in if isSome res andalso isVal res > hd xs then res
       else Some(hd xs) 
    end

그런데, 이 max1 또한 문제가 있다. 사실 [] 는 리스트가 오름차순으로 구성되어있을때 ([1, 2, 3, 4]) 맨 마지막 호출에서만 오는데, 매번 isSome 으로 검사하니까 비효율적이다. 다시 리팩토링하면

fun max2 (xs: int list) =  
    if null xs then NONE
    else
    let
        fun max_non_empty(ys: int list) =
        if null (tl ys) then hd ys
        else
            let val res = max_non_empty(tl ys)
            in if hd xs > res then hd ys
               else res
            end
    in
        SOME (max_non_empty xs)
    end

~~교수님 let in end 는 제발그만 가..가독성이~~

Booleans and Comparison Operations

다른 언어의 && (and) 와 || (or) ! (not) 은 SML 에서는 ansalso, orelse, not 이다.

그리고 andalso, orelse 연산자는 다른 언어처럼 Short circuiting 을 제공한다. 함수가 아니다.

andalso 와 orelse, not 은 다음처럼 쓸 수도 있다.

(* andalso *)
if e1  
then e2  
else false

(* orelse *)
if e1  
then true  
else e2

(* not *)
if e1  
then false  
else true

if-then-else 만 가지고도 할 수 있으나, 그냥 andalso 와 orelse, not 을 쓰는게 코드를 읽는 사람의 정신 건강에 좋다.

Comparisons

비교의 경우엔 == 가 아니라 = 를 쓴다. != 가 아니라 <> 를 쓴다.

그리고 3.0 > 2 와 같은 비교는 안된다.3.0 은 real 이고, 2 는 int 다 Real.fromInt 를 이용하자.

No Mutation

교수님이 A valuable non-feature 라고 이야기 하시는데, 기능이 없는게 장점이라고.. SML 에서는 아래의 두 코드가 같다. (inditinguishable)

fun sort_pair(pr: int * int) =  
  if #1 pr < #2 pr then pr
  else (#2 pr, #1 pr)

fun sort_pair(pr: int * int) =  
  if #1 pr < #2 pr then (#1 pr, #2pr)
  else (#2 pr, #1 pr)

위가 더 나은 style 이라고 주장할 순 있지만, 다르다고 말할 순 없다. 그러나 mutable compound data 를 다루는 다른 언어에서는 위 두 함수는 다르다

예를 들어서 mutable data 를 다루는 언어라 가정하고 다음의 코드를 고려 해 보자.

val x = (3, 4)  
val y = sort_pair x

(* somehow mutate #1 x to hold 5 *)

val z = #1 y

이제 z 의 값은 무엇일까? sort_pair 구현에 따라 다르다. then 에서 pr 을 돌려줬다면, 5 일거고, (#1 pr, #2 pr) 을 돌려줬다면 3 일거다. 그러나 ML 에선 문제가 안된다. immutable 하니까. ~~오오 immutable 오오~~

But without mutation, we can implement either way

- No code can ever distinguishe aliasing vs identical copies
- No need to think about aliasing: focus on other things
- Can use aliasing, which saves space, without danger

ML 에서는 tl 이 상수 시간내에 이뤄진다. 첫번째 원소를 제외한 나머지를 가리키기만 하면 되니까. 어차피 그 데이터는 immutable 이니까 그냥 쓰기만 하면 된다. 변경이 필요하면, 그 때 새로 만들면 된다.

Java Mutation

public String[] getAllowedUsers() {  
  // return a references of allowedUsers
}

allowedUsers 자체가 private 여도, 다음과 같은 코드에 취약하다.

p.getAllowedUsers()[0] = p.currentUser();  
p.useTheResource();

따라서 위의 getAllowedUsers 는 copy 를 돌려줘야한다.

ML 같은 immutable 한 언어에서는 Reference (Alias) vs Copy 는 문제가 안된다.

Pieces of a Language

언어를 배울때는 다음의 다섯가지를 고려해야한다.

(1) Syntax: How do you write language constructs?
(2) Semantics: What do programs mean? (Evaluation rules)
(3) Idioms: What are typical patterns for using language features to express your computation
(4) Libraries: What facilities does the language provide standard?
(5) Tools: What do language implementations provide to make your job easier? (REPL debugger..)

다시 말하면 다섯가지를 모두 배워야 한다. 언어의 코어부터 그 확장인 라이브러리와 툴까지. 그러나 이 코스에서는 Semantics 와 Idioms 에 집중한다. ML 을 고른것도 그 이유고, 이걸 잘 이해하게 되면 Libraries 가 어떻게 구성되었는지 더 잘 이해할 수 있다.

Summary

처음엔 재귀가 어려웠는데, 시간이 지날수록 재귀가 얼마나 재밌고 강력한지 알게 된다. 함수를 building block 처럼 조립하는것도 너무 재밌고

Lisp 을 이용했으면 개인적 취향에 맞아 더 재밌게 배울 수 있었을텐데. 나중에 수업이 모두 끝났을때 ML 을 고른 이유를 느낄 수 있었으면 좋겠다.

그리고 Emacs sml-mode 가 좀.. 음.. 빈약한데 더 좋은걸 찾아야겠다. MELPA 엔 없던데.. 테스팅 프레임워크도 좀 찾아서 해보고.

Old Lisper

Lisp, Emacs, Scala