Training a deep network is very hard, as networks are prone to vanishing or exploding gradients. This problem was solved by the introduction of residual blocks and batch normalization. These two approaches have led to the successful training of deeper networks, with greater accuracies on the training and test set. …