ECE 8550: Task 1
1 Linear regression
On this part, you’ll implement linear regression to foretell the loss of life charge from a number of options, together with annual precipitation, temperature, inhabitants, revenue, air pollution, and so forth. The info for this task is given in file information.txt, which incorporates the info description, 17 columns (options) and 60 rows (examples). Within the information matrix, columns 2-16 are enter options, and column 17 is the output goal. Notice that Column 1 is the index and shouldn’t be used within the regression. You’ll implement gradient descent for studying the linear regression mannequin.
We cut up the info to a coaching set and a testing set following the 80/20 rule. Thus, we’ve got 48 coaching samples (n = 48) with 16 options (d = 16). The i-th pattern is denoted by xi the place x
j i is the worth of the j-th characteristic this pattern.
1.1 Characteristic normalization
By wanting on the characteristic values within the information, you might discover that some options are about 1000 instances the opposite options. When options differ by orders of magnitude, first performing characteristic normalization could make gradient descent converge rather more rapidly. There are other ways to do the characteristic normalization. For simplicity, we use the next technique: (1) subtract the minimal worth of every characteristic; (2) divide the characteristic values by their ranges of values. Equivalently, it’s given by the next equation:
for every X, let Xnorm = X −Xmin
Xmax −Xmin
Equally normalize the goal Y . Notice that to make prediction for a brand new information level, you additionally must first normalize the options as what you probably did beforehand for the coaching dataset.
1
1.2 Characteristic appending
Traditional regression is outlined as
f(xi) = w · xi + b = d∑
j=1
wj ×xji + b,
the place bi is the intercept. For simplicity, we let b = w d+1 ×xd+1i the place x
d+1 i = 1 and w
d+1 is unknown however learnable parameter. Thus, we will mix w and b by way of letting
w ←[w1, w2, w3, . . . ,wd,wd+1]T ; xi ←[x1i , x
2 i , x
3 i , . . . , x
d i , 1]
T .
1.three Gradient descent with ridge regularization
To recap, the prediction perform for linear regression is given by
f(xi) = w · xi = wT xi = d+1∑ j=1
wj ×xji, (1)
the place
• xi is the characteristic vecter of the i-th pattern;
• xji is the worth of the j-th characteristic for the i-th pattern;
• wj is the j-th parameter of w;
The loss perform for linear regression with ridge regularization is given by
J(w) = 1
2n
n∑ i=1
(f(xi) −yi) 2
+ λ
2(d + 1)
d+1∑ j=1
(wj)2. (2)
To attenuate this perform, we first receive the gradient with respect to every parameter wj as:
∂J(w)
∂wj =
1
n
n∑ i=1
(f(xi) −yi) x j i +
λ
d + 1 wj. (three)
Then, the (full) gradient descent algorithm is given as:
the place α is the training charge. The convergence standards for the above algorithm is ∆%value < �, the place
∆%value = |Jk−1(w) −Jk(w)|× 100
Jk−1(w) ,
2
Algorithm 1: Batch Gradient Descent
okay = zero; whereas convergence standards not reached do
for j = 1, . . . ,m do
Replace wj ← wj −α∂J(w) ∂wj
Replace okay ← okay + 1;
the place Jk(w) is the worth of Eq. (2) at k-th iteration, and ∆%value is computed on the finish of every iteration of the whereas loop. Initialize w = zero and compute J0(w) with these values.
Necessary: You need to concurrently replace wj for all j.
Activity: Load the dataset in information.txt, cut up it into 80% / 20% coaching/check. The dataset is already shuffled so you may merely use the primary 80% examples for coaching and the remaining 20% for check. Study the linear regression mannequin utilizing the coaching information. Plot the worth of loss perform Jk(w) vs. the variety of iterations (okay). For every iteration, compute the imply squared loss (w/o regularization perform) on the check information. Plot the imply squared loss over the check information v.s. the variety of iterations (okay) in the identical determine. The imply squared loss is outlined as
MSE = 1
2m
m∑ i=1
[yi −w · xi] 2 .
1.four Gradient descent with lasso regularization
The loss perform for linear regression with lasso regularization is given by
J(w) = 1
2n
n∑ i=1
(yi −f(xi)) 2
+ λ
2(d + 1)
d+1∑ j=1
|wj|. (four)
To attenuate the loss perform, you have to to derive the gradient by your self. The gradient descent algorithm is identical because the above.
Trace: For simplicity, you may think about ∂|wj| wj
= 1 when wj = zero.
Activity: Load the dataset in information.txt, cut up it into 80% / 20% coaching/check,. Study the lasso regression mannequin utilizing the coaching information. Plot the worth of loss perform Jk(w) vs. the variety of iterations (okay). Plot the imply squared loss over the check information v.s. the variety of iterations (okay) in the identical determine.
three
1.5 What to submit
On this task, use α = zero.01 and � = zero.zero01. Use λ = 2 for ridge regularization, and use λ = zero.three for lasso regularization.
1. (50 pts) The supply code (.py) with out operating errors in Anaconda 2021.11 (Python three.9).
2. (20 pts) Plot the worth of loss perform Jk(w) and MSE vs. the variety of iterations (okay) for Part 1.2, report the squared loss on the check information for Part 1.three.
three. (10 pts) The variety of components in w whose absoluate worth is smaller than zero.01 for Part 1.three.
four. (10 pts) Equation for the gradient of Eq. (four) which ought to be just like Eq. three.
5. (10 pts) Plot the worth of loss perform Jk(w) and MSE vs. the variety of iterations (okay) for Part 1.three, report the squared loss on the check information for Part 1.four.
6. (10 pts) The variety of components in w whose absoluate worth is smaller than zero.01 for Part 1.four.
1.6 Necessities
1. This task is carried out utilizing Python in Anaconda 2021.11 (Python three.9).
2. No libraries are allowed besides Python built-ins, numpy, and matplotlib.
three. Figures, equations, and numbers ought to be in a PDF file.
four. Zip the Python supply code (.py) and your PDF file earlier than importing.
four
—-
Task 1 for ECE 8550
1 Regression Linear
On this part, you’ll use linear regression to forecast the loss of life charge primarily based on quite a lot of elements comparable to annual precipitation, temperature, inhabitants, wealth, air pollution, and so forth. This task’s information is contained within the file information.txt, which contains the info description, 17 columns (options), and 60 rows (examples). Columns 2-16 of the info matrix are enter options, whereas column 17 is the output goal. It is very important be aware that Column 1 is the index and shouldn’t be used within the regression. For studying the linear regression mannequin, you’ll use gradient descent.
We divided the info into two units, one for coaching and one for testing, utilizing the 80/20 strategy. Because of this, there are 48 coaching examples (n = 48) with 16 options (d = 16). The