CS 446 Homework 2

$35.00 $29.00

Category:

Description

5/5 – (2 votes)
  • Soft-margin SVM: 4pts

In the lecture, we learned about the duality of hard-margin SVM. Given data examples ((xi, yi))Ni=1, the primal form of hard-margin SVM is

min

1

w

2

,

s.t. y

(wT x

)

1

for all i.

arg

2

2

w

i

i

Its dual form is

N

1

N

N

xT x

N

max

α

α α y

y

,

α

for all i,

α

y

.

i 2

j(

s.t.

i

0

α

0

i

j i

i

j)

i

i = 0

i=1

i=1 j=1

i=1

Now consider the soft-margin SVM. The primal form is

1

N

arg

min

w

2

+

C

ξ

,

s.t.

y

i(

wT x

i) ≥ 1 −

ξ

i and

ξ

i ≥ 0

for all

i

2

w,ξ

i

i=1

Derive the dual form of the soft-margin SVM.

 

  • Decision Tree and Adaboost: 12 pts

The figure below shows a dataset D = {(x(i), y(i))}6i=1 (where x(i) R2, y(i) R), containing six data points with two features x1 and x2. For example, x(1) = [1 2] and x(2) = [2 1].

The label y(i) can take on the values 1 (blue) or −1 (green).

The decision tree defined in class can only support discrete-valued instances. Here, we extend this concept to general continuous spaces. A continuous-valued decision attribute need to be represented using a comparison operator (≥, <). Specifically, for each round, unlike discrete-valued tree building that chooses only which feature to use as the current decision attribute, we will specify a feature (x1 or x2) and also a threshold τ, and then create two descendant nodes at the current node. For all data points in the current node, those below the threshold will be put into child node “xj < τ”, and those above the threshold will be put into child node “xj ≥ τ”(j = 1 or 2).

Note: We assume τ can only be integer values. Please describe the split rule as “xj ≥ τ”, such as x1 ≥ 1 or x2 ≥ 2. Do not use the answers like x1 ≥ 2.5 or x2 > 2. And also make sure to describe what the predicted label is in two child nodes “xj < τ” and “xj ≥ τ” (j = 1 or 2).

Note: Use log2(·) in the calculation of entropy.

  1. (1pts) What is the sample entropy of D? Show each step of your calculation.

  1. (2pts) What is the maximum information gain if we split the root into two child nodes? what is the rule for this split? Show each step of calculating information gain. You do not need to prove the split.

  1. (3pts) After the first split in 2., how do we further split child nodes based on maximum information gain? Please also give information gain for each split.

Adaboost. A decision stump is an one-level decision tree. It classifies cases with only one attribute. In this problem, you will run through T = 2 steps of AdaBoost with decision

 

stumps as weak learners on dataset D. For the sake of notation simplicity, we denote x(1) = [1, 2], x(2) = [2, 1], x(3) = [3, 4], x(4) = [4, 6], x[5] = [5, 3], x(6) = [6, 5].

  1. (4pts) For each iteration t = 1, 2, compute the weights for each sample γt R6, the weighted error rate ϵt, the weight of the decisition stump αt and the decision stump ft.

Note:

    • Please describe the split rule as xj ≥ τ or xj < τ where τ is an integer.

    • If you find multiple solutions, giving one is okay.

  1. (2pts) Following 4., write down the rule of the classifier you constructed with a formula. Does your solution classify each case correctly? Show your work.

Note:

    • You may use the function sign in the decision rule. Where

1 for x ≥ 0,

sign(x) =

1 for x < 0

Homework 2

CS 446

Hint: Note that wx + w0 can be written as [x

1]

w . Moreover, consider

w0

to put all data points into a matrix, e.g.,

X =

(x(2))

1

.

(x(1))

1

...

...

  1. (4pts) Cosine classifier

Consider the classifier based on the cosine function

Fcos = {1{cos(cx) ≥ 0} : X → R | c R} Show what is V C(Fcos) and prove your result.

 

  • Coding: SVM, 24pts

Recall that the dual problem of SVM is

n

1

n

max

α

α α

y

y

K

x

, x

.

i 2

j)

α

C

i=1

i j

i

j

(

i

i,j=1

where the domain C = [0, ∞)n = {α : αi ≥ 0} for hard-margin SVM, and you already derive the domain for soft-margin SVM in Q1.

Equivalently, it can be formulated as a minimization problem