Description

5/5 – (2 votes)

Soft-margin SVM: 4pts

In the lecture, we learned about the duality of hard-margin SVM. Given data examples ((x_i, y_i))^N_i=1, the primal form of hard-margin SVM is

min

s.t. y

(w^T x

)

for all i.

arg

2^∥

∥₂

≥

Its dual form is

x^T x

max

α α y

for all i,

i − ₂

_j(

s.t.

_i ≥

≥

j i

_j)

_i = 0

i=1

i=1 j=1

i=1

Now consider the soft-margin SVM. The primal form is

arg

min

∥

s.t.

_i(

w^T x

_i) ≥ 1 −

_i and

_i ≥ 0

for all

w,ξ

∥

i=1

Derive the dual form of the soft-margin SVM.

Decision Tree and Adaboost: 12 pts

The figure below shows a dataset D = {(x⁽ⁱ⁾, y⁽ⁱ⁾)}⁶_i=1 (where x⁽ⁱ⁾ ∈ R², y⁽ⁱ⁾ ∈ R), containing six data points with two features x₁ and x₂. For example, x⁽¹⁾ = [1 2]^⊤ and x⁽²⁾ = [2 1]^⊤.

The label y⁽ⁱ⁾ can take on the values 1 (blue) or −1 (green).

The decision tree defined in class can only support discrete-valued instances. Here, we extend this concept to general continuous spaces. A continuous-valued decision attribute need to be represented using a comparison operator (≥, <). Specifically, for each round, unlike discrete-valued tree building that chooses only which feature to use as the current decision attribute, we will specify a feature (x₁ or x₂) and also a threshold τ, and then create two descendant nodes at the current node. For all data points in the current node, those below the threshold will be put into child node “x_j < τ”, and those above the threshold will be put into child node “x_j ≥ τ”(j = 1 or 2).

Note: We assume τ can only be integer values. Please describe the split rule as “x_j ≥ τ”, such as x₁ ≥ 1 or x₂ ≥ 2. Do not use the answers like x₁ ≥ 2.5 or x₂ > 2. And also make sure to describe what the predicted label is in two child nodes “x_j < τ” and “x_j ≥ τ” (j = 1 or 2).

Note: Use log₂(·) in the calculation of entropy.

(1pts) What is the sample entropy of D? Show each step of your calculation.

(2pts) What is the maximum information gain if we split the root into two child nodes? what is the rule for this split? Show each step of calculating information gain. You do not need to prove the split.

(3pts) After the first split in 2., how do we further split child nodes based on maximum information gain? Please also give information gain for each split.

Adaboost. A decision stump is an one-level decision tree. It classifies cases with only one attribute. In this problem, you will run through T = 2 steps of AdaBoost with decision

stumps as weak learners on dataset D. For the sake of notation simplicity, we denote x⁽¹⁾ = [1, 2]^⊤, x⁽²⁾ = [2, 1]^⊤, x⁽³⁾ = [3, 4]^⊤, x⁽⁴⁾ = [4, 6]^⊤, x^[5] = [5, 3]^⊤, x⁽⁶⁾ = [6, 5]^⊤.

(4pts) For each iteration t = 1, 2, compute the weights for each sample γ_t ∈ R⁶, the weighted error rate ϵ_t, the weight of the decisition stump α_t and the decision stump f_t.

Note:

- Please describe the split rule as x_j ≥ τ or x_j < τ where τ is an integer.

- If you find multiple solutions, giving one is okay.

(2pts) Following 4., write down the rule of the classifier you constructed with a formula. Does your solution classify each case correctly? Show your work.

Note:

- You may use the function sign in the decision rule. Where

1 for x ≥ 0,

sign(x) =

−1 for x < 0

Homework 2					CS 446


Hint: Note that w^⊤x + w₀ can be written as [x^⊤				1]	^w. Moreover, consider
					^w0
to put all data points into a matrix, e.g.,
X =	_(x(2)₎⊤		1	.
		₍_x(1)₎⊤	1
		_..^.	_..^.

(4pts) Cosine classifier

Consider the classifier based on the cosine function

F_cos = {1{cos(cx) ≥ 0} : X → R | c ∈ R} Show what is V C(F_cos) and prove your result.

Coding: SVM, 24pts

Recall that the dual problem of SVM is

max

α α

, x

i − ₂

j⁾

∈C

i=1

i j

(

i,j=1

where the domain C = [0, ∞)ⁿ = {α : α_i ≥ 0} for hard-margin SVM, and you already derive the domain for soft-margin SVM in Q1.

Equivalently, it can be formulated as a minimization problem

CS 446 Homework 2

Share this:

Description