COS711 Assignment 1
Due date: 4 September 2022, at 23h30
1 General instructions
This assignment is theoretical, and will test your understanding of the backpropagation algorithm. You have to submit a single pdf document containing your answers to the questions provided. Note: the assignment is designed to test your ability to derive the weight update equations for arbitrary loss and activation functions. Thus, you will loose marks by skipping over steps. Make sure your derivations are readable, notation is correct, and the steps (including simplifications) are clear.
The report will be checked for plagiarism using Turnitin, and should be submitted through the ClickUp system. You are advised but not required to typeset your report in LaTeX.
2 Deriving backpropagation (25 marks)
A feed-forward neural network is set up to have an input layer of size I, a hidden layer of size H, and an output layer of size O. The following activation functions are employed in the neurons on each layer:
• Input layer: identity, f(x) = x
• Hidden layer: Softplus, f(x) = ln(1 + ex)
• Output layer: Modified Elliott,
Hidden and output units are summation units, i.e. x in the activation functions above refers to the net input signal, thus the output signal of a neuron j is yj = f(netj) = f(Pwkjyk). Bias signal is sent to all hidden, as well as all output units. Assume the objective function E is used, defined for a single input data pattern as:
where yi is the output of the i-th output unit, ti is the target output of the i-th output unit, ln refers to natural logarithm, s1 and s2 are scalar values such that s1 + s2 = 1, maxE1 and maxE2 are the maximum values produced by E1 and E2 over the data set before training, respectively.
Answer the questions below:
1. Derive the update rule for the non-bias weights between the hidden and the output layer. Showall steps, including simplifications. (10 marks)
1
2. Derive the update rule for the non-bias weights between the input and the hidden layer. Showall steps, including simplifications. (10 marks)
3. Will the bias weight update rules differ from the non-bias weight update rules? Derive theupdate rule for the bias weights associated with the hidden layer. Show your working. (5 marks)
2