原文翻译:深度学习测试题(L1 W3 测试题)
导语
本文翻译自deeplearning.ai的深度学习课程测试作业,近期将逐步翻译完毕,一共五门课。
翻译:黄海广
本集翻译Lesson1 Week 2:
Lesson1 Neural Networks and Deep Learning (第一门课 神经网络和深度学习)
Week 3 Quiz - Shallow Neural Networks(第三周测验 - 浅层神经网络)
1.Which of the following are true? (Check all that apply.) Notice that I only list correct options
(以下哪一项是正确的?只列出了正确的答案)
【★】 is a matrix in which each column is one training example.(是一个矩阵,其中每个列都是一个训练样本。)
【★】 is the activation output by the 4th neuron of the 2nd layer(是第二层第四层神经元的激活的输出。)
【★】 denotes the activation vector of the 2nd layer for the 12th training example.(表示第二层和第十二层的激活向量。)
【★】 denotes the activation vector of the 2nd layer.( 表示第二层的激活向量。)
2. The tanh activation usually works better than sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data better for the next layer. True/False?
(tanh激活函数通常比隐藏层单元的sigmoid激活函数效果更好,因为其输出的平均值更接近于零,因此它将数据集中在下一层是更好的选择,请问正确吗?)
【★】True(正确)
【 】 False(错误)
Note: You can check [this post]:(https://stats.stackexchange.com/a/101563/169377) and (this paper):[http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf].(请注意,你可以看一下这篇文章:(https://stats.stackexchange.com/a/101563/169377)和这篇文档:(http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf).)
As seen in lecture the output of the tanh is between -1 and 1, it thus centers the data which makes the learning simpler for the next layer.(tanh的输出在-1和1之间,因此它将数据集中在一起,使得下一层的学习变得更加简单。)
3. Which of these is a correct vectorized implementation of forward propagation for layer , where ? Notice that I only list correct options
(其中哪一个是第l层向前传播的正确向量化实现,其中)(以下哪一项是正确的?只列出了正确的答案)
【★】
【★】
4. You are building a binary classifier for recognizing cucumbers (y=1) vs. watermelons (y=0). Which one of these activation functions would you recommend using for the output layer?
(您正在构建一个识别黄瓜(y = 1)与西瓜(y = 0)的二元分类器。你会推荐哪一种激活函数用于输出层?)
【 】 ReLU
【 】 Leaky ReLU
【★】 sigmoid
【 】 tanh
Note: The output value from a sigmoid function can be easily understood as a probability.(注意:来自sigmoid函数的输出值可以很容易地理解为概率。)
Sigmoid outputs a value between 0 and 1 which makes it a very good choice for binary classification. You can classify as 0 if the output is less than 0.5 and classify as 1 if the output is more than 0.5. It can be done with tanh as well but it is less convenient as the output is between -1 and 1.(Sigmoid输出的值介于0和1之间,这使其成为二元分类的一个非常好的选择。如果输出小于0.5,则可以将其归类为0,如果输出大于0.5,则归类为1。它也可以用tanh来完成,但是它不太方便,因为输出在-1和1之间。)
5. Consider the following code:(看一下下面的代码:)
A = np.random.randn(4,3)
B = np.sum(A, axis = 1, keepdims = True)
What will be B.shape
?(请问B.shape
的值是多少?)
B.shape = (4, 1)
we use (keepdims = True) to make sure that A.shape is (4,1) and not (4, ). It makes our code more rigorous.(我们使用(keepdims = True)来确保A.shape是(4,1)而不是(4,),它使我们的代码更加严格。)
6. Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements are True? (Check all that apply)
(假设你已经建立了一个神经网络。您决定将权重和偏差初始化为零。以下哪项陈述是正确的?)
【★】Each neuron in the first hidden layer will perform the same computation. So even after multiple iterations of gradient descent each neuron in the layer will be computing the same thing as other neurons.(第一个隐藏层中的每个神经元节点将执行相同的计算。所以即使经过多次梯度下降迭代后,层中的每个神经元节点都会计算出与其他神经元节点相同的东西。)
【 】Each neuron in the first hidden layer will perform the same computation in the first iteration. But after one iteration of gradient descent they will learn to compute different things because we have “broken symmetry”.( 第一个隐藏层中的每个神经元将在第一次迭代中执行相同的计算。但经过一次梯度下降迭代后,他们将学会计算不同的东西,因为我们已经“破坏了对称性”。)
【 】Each neuron in the first hidden layer will compute the same thing, but neurons in different layers will compute different things, thus we have accomplished “symmetry breaking” as described in lecture.(第一个隐藏层中的每一个神经元都会计算出相同的东西,但是不同层的神经元会计算不同的东西,因此我们已经完成了“对称破坏”。)
【 】The first hidden layer’s neurons will perform different computations from each other even in the first iteration; their parameters will thus keep evolving in their own way.(即使在第一次迭代中,第一个隐藏层的神经元也会执行不同的计算,他们的参数将以自己的方式不断发展。)
7. Logistic regression’s weights w should be initialized randomly rather than to all zeros, because if you initialize to all zeros, then logistic regression will fail to learn a useful decision boundary because it will fail to “break symmetry”, True/False?
(Logistic回归的权重w应该随机初始化,而不是全零,因为如果初始化为全零,那么逻辑回归将无法学习到有用的决策边界,因为它将无法“破坏对称性”,是正确的吗?)
【 】True(正确)
【★】 False(错误)
Note: Logistic Regression doesn’t have a hidden layer. If you initialize the weights to zeros, the first example x fed in the logistic regression will output zero but the derivatives of the Logistic Regression depend on the input x (because there’s no hidden layer) which is not zero. So at the second iteration, the weights values follow x’s distribution and are different from each other if x is not a constant vector.(Logistic回归没有隐藏层。如果将权重初始化为零,则Logistic回归中的第一个样本x将输出零,但Logistic回归的导数取决于不是零的输入x(因为没有隐藏层)。因此,在第二次迭代中,如果x不是常量向量,则权值遵循x的分布并且彼此不同。)
8. You have built a network using the tanh activation for all the hidden units. You initialize the weights to relative large values, using np.random.randn(..,..)*1000. What will happen?
(您已经为所有隐藏单元使用tanh激活建立了一个网络。使用np.random.randn(..,..)* 1000将权重初始化为相对较大的值。会发生什么?)
【 】 It doesn’t matter. So long as you initialize the weights randomly gradient descent is not affected by whether the weights are large or small.(这没关系。只要随机初始化权重,梯度下降不受权重大小的影响。)
【 】 This will cause the inputs of the tanh to also be very large, thus causing gradients to also become large. You therefore have to set 伪 to be very small to prevent divergence; this will slow down learning.(这将导致tanh的输入也非常大,因此导致梯度也变大。因此,您必须将α设置得非常小以防止发散; 这会减慢学习速度。)
【 】 This will cause the inputs of the tanh to also be very large, causing the units to be “highly activated” and thus speed up learning compared to if the weights had to start from small values.(这会导致tanh的输入也非常大,导致单位被“高度激活”,从而加快了学习速度,而权重必须从小数值开始。)
【★】 This will cause the inputs of the tanh to also be very large, thus causing gradients to be close to zero. The optimization algorithm will thus become slow.(这将导致tanh的输入也很大,因此导致梯度接近于零, 优化算法将因此变得缓慢。)
Note:tanh becomes flat for large values, this leads its gradient to be close to zero. This slows down the optimization algorithm.(注:tanh对于较大的值变得平坦,这导致其梯度接近于零。这减慢了优化算法。)
9. Consider the following 1 hidden layer neural network:Notice that I only list correct options
(看一下下面的单隐层神经网络:只列出了正确的答案)
【★】 will have shape (4, 1)(的维度是(4, 1))
【★】 will have shape (4, 2)(的维度是 (4, 2))
【★】 will have shape (1, 4)(的维度是 (1, 4))
【★】 will have shape (1, 1)(的维度是 (1, 1))
10. In the same network as the previous question, what are the dimensions of and ?
(在和上一个相同的网络中, 和 的维度是多少?只列出了正确的答案)
【★】and are (4,m)(和 的维度都是 (4,m))
Note: For general formulas to do this.(请注意: 来看一下公式)
Week 4 Quiz - Key concepts on Deep Neural Networks(第四周测验 – 深层神经网络)
1. What is the “cache” used for in our implementation of forward propagation and backward propagation?(在实现前向传播和反向传播中使用的“cache”是什么?)
【 】It is used to cache the intermediate values of the cost function during training.(用于在训练期间缓存成本函数的中间值。)
【★】We use it to pass variables computed during forward propagation to the corresponding backward propagation step. It contains useful values for backward propagation to compute derivatives.(我们用它传递前向传播中计算的变量到相应的反向传播步骤,它包含用于计算导数的反向传播的有用值。)
【 】It is used to keep track of the hyperparameters that we are searching over, to speed up computation.(它用于跟踪我们正在搜索的超参数,以加速计算。)
【 】We use it to pass variables computed during backward propagation to the corresponding forward propagation step. It contains useful values for forward propagation to compute activations.(我们使用它将向后传播计算的变量传递给相应的正向传播步骤,它包含用于计算计算激活的正向传播的有用值。)
Note: the “cache” records values from the forward propagation units and sends it to the backward propagation units because it is needed to compute the chain rule derivatives.(请注意:“cache”记录来自正向传播单元的值并将其发送到反向传播单元,因为需要链式计算导数。)
2. Among the following, which ones are “hyperparameters”? (Check all that apply.) I only list correct options.(以下哪些是“超参数”?只列出了正确选项)
【★】size of the hidden layers (隐藏层的大小)
【★】learning rate α(学习率α)
【★】number of iterations(迭代次数)
【★】number of layers in the neural network(神经网络中的层数)
Note: You can check this Quora post orthis blog post.(请注意:你可以查看Quora的这篇文章或者这篇博客.)
3. Which of the following statements is true?(下列哪个说法是正确的?)
【★】The deeper layers of a neural network are typically computing more complex features of the input than the earlier layers. (神经网络的更深层通常比前面的层计算更复杂的输入特征。)
【 】 The earlier layers of a neural network are typically computing more complex features of the input than the deeper layers.(神经网络的前面的层通常比更深层计算更复杂的输入特征。)
Note: You can check the lecture videos. I think Andrew used a CNN example to explain this.(注意:您可以查看视频,我想用吴恩达的用美国有线电视新闻网的例子来解释这个。)
4. Vectorization allows you to compute forward propagation in an -layer neural network without an explicit for-loop (or any other explicit iterative loop) over the layers l=1, 2, …,L. True/False?(向量化允许您在层神经网络中计算前向传播,而不需要在层(l = 1,2,…,L)上显式的使用for-loop(或任何其他显式迭代循环),正确吗?)
【 】 True(正确)
【★】 False(错误)
Note: We cannot avoid the for-loop iteration over the computations among layers.(请注意:在层间计算中,我们不能避免for循环迭代。)
5. Assume we store the values for in an array called layers, as follows: layer_dims = [, 4,3,2,1]. So layer 1 has four hidden units, layer 2 has 3 hidden units and so on. Which of the following for-loops will allow you to initialize the parameters for the model?(假设我们将的值存储在名为layers的数组中,如下所示:layer_dims = [, 4,3,2,1]。因此,第1层有四个隐藏单元,第2层有三个隐藏单元,依此类推。您可以使用哪个for循环初始化模型参数?)
for(i in range(1, len(layer_dims))):parameter[‘W’ + str(i)] = np.random.randn(layers[i], layers[i - 1])) * 0.01 `parameter[‘b’ + str(i)] = np.random.randn(layers[i], 1) * 0.01
6. Consider the following neural network.(下面关于神经网络的说法正确的是:只列出了正确选项)
【★】The number of layers is 4. The number of hidden layers is 3.(层数为4,隐藏层数为3)
Note: The input layer () does not count.(注意:输入层()不计数。)
As seen in lecture, the number of layers is counted as the number of hidden layers + 1. The input and output layers are not counted as hidden layers.(正如视频中所看到的那样,层数被计为隐藏层数+1。输入层和输出层不计为隐藏层。)
7. During forward propagation, in the forward function for a layer you need to know what is the activation function in a layer (Sigmoid, tanh, ReLU, etc.). During backpropagation, the corresponding backward function also needs to know what is the activation function for layer , since the gradient depends on it. True/False?(在前向传播期间,在层的前向传播函数中,您需要知道层中的激活函数(Sigmoid,tanh,ReLU等)是什么, 在反向传播期间,相应的反向传播函数也需要知道第层的激活函数是什么,因为梯度是根据它来计算的,正确吗?)
【★】 True(正确)
【 】False(错误)
Note: During backpropagation you need to know which activation was used in the forward propagation to be able to compute the correct derivative.(注:在反向传播期间,您需要知道正向传播中使用哪种激活函数才能计算正确的导数。)
8.There are certain functions with the following properties:(有一些函数具有以下属性:)
(i) To compute the function using a shallow network circuit, you will need a large network (where we measure size by the number of logic gates in the network), but (ii) To compute it using a deep network circuit, you need only an exponentially smaller network. True/False?((i)使用浅网络电路计算函数时,需要一个大网络(我们通过网络中的逻辑门数量来度量大小),但是(ii)使用深网络电路来计算它,只需要一个指数较小的网络。真/假?)
【★】True(正确)
【 】False(错误)
Note: See lectures, exactly same idea was explained.(参见视频,完全相同的题。)
9. Consider the following 2 hidden layer neural network: Which of the following statements are True? (Check all that apply).((在2层隐层神经网络中,下列哪个说法是正确的?只列出了正确选项))
【★】 will have shape (4, 4)(的维度为 (4, 4))
【★】 will have shape (4, 1)(的维度为 (4, 1))
【★】 will have shape (3, 4)(的维度为 (3, 4))
【★】 will have shape (3, 1)(的维度为 (3, 1))
【★】 will have shape (1, 1)(的维度为 (1, 1))
【★】 will have shape (1, 3)(的维度为 (1, 3))
Note: See [this image] for general formulas.(注:请参阅图片。)
10. Whereas the previous question used a specific network, in the general case what is the dimension of , the weight matrix associated with layer ?(前面的问题使用了一个特定的网络,与层ll有关的权重矩阵在一般情况下, 的维数是多少,只列出了正确选项)
【★】 has shape (,)(的维度是 (,)
Note: See this imagefor general formulas.(注:请参阅图片)
备注:公众号菜单包含了整理了一本AI小抄,非常适合在通勤路上用学习。
往期精彩回顾2019年公众号文章精选适合初学者入门人工智能的路线及资料下载机器学习在线手册深度学习在线手册AI基础下载(第一部分)备注:加入本站微信群或者qq群,请回复“加群”加入知识星球(4500+用户,ID:92416895),请回复“知识星球”
喜欢文章,点个在看
本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!