In the quants section on Data Analysis I came across a question on standardization for which I suppose the answer should be D instead of C. If we take cases for this problem then the sum of p can be equal to 0. For example if the list is 4 and 16 then the sum of standardization values can be 0 ie -1 and 1. Kindly explain whether this approach is apt in solving such questions.
The sum of the standardized values of a list of more than 1 element is always 0.
In your example of a list of 4 and 16, the average is (4+16)/2=20/2 = 10. Thus, the SD for a list of two numbers is the distance between each number and the average. We get that SD = 10 - 4 = 6.
I agree with your next step. Let’s standardize the values by subtracting the average and dividing by the SD.
4,16 → (4-10)/6, (16-10)/6 → -6/6, 6/6 → -1, 1
Thus, the sum of the standardized values of -1 and 1 is -1 + 1 = 0.
Thus, p = 0.
Proof:
Suppose that a list consists of numbers x_1, x_2, x_3,…x_n where n > 1. We want to show that the standardized values sum to 0. Define M = mean and SD = standard deviation of the list.
Sum of standardized values = (x_1-M)/SD + (x_2-M)/SD +…+ (x_n-M)/SD = [(x_1-M) + (x_2-M) +…+(x_n-M)]/SD = [ x_1 + x_2 + … + x_n - nM]/SD (*).
The -nM in the last step comes from the fact that we are adding n copies of -M.
Since M = (x_1 + x_2 + … x_n)/n, then x_1 + x_2 + … + x_n = nM. Let’s use that fact to substitute into (*).
Rewriting the steps from before: Sum of standardized values = (x_1-M)/SD + (x_2-M)/SD +…+ (x_n-M)/SD = [(x_1-M) + (x_2-M) +…+(x_n-M)]/SD = [ x_1 + x_2 + … + x_n - nM]/SD = [nM - nM]/SD = 0/SD = 0.
Thus, the sum is 0.
Yeah so by that logic the answer should be D but in the key it’s given to be C.
Cause if you say take any other list 4,6,16,20,25 and then standardise it then the p value is greater than zero. Boom we got two cases which is in conflict with each other.
No, even in that case, the sum will be zero. Since you are subtracting the average, some of those values will become negative when standardized.
Take a simpler list: 1,2,3,4,5. Each of these values are positive, but the associated standardized values are not all positive. Since the average = 3 and SD = sqrt(2), then the standardized list is (1-3)/sqrt(2), (2-3)/sqrt(2), (3-3)/sqrt(2), (4-3)/sqrt(2), (5-3)/sqrt(2) → -2/sqrt(2), -1/sqrt(2), 0/sqrt(2), 1/sqrt(2), 2/sqrt(2). If we add those, then we get -2/sqrt(2) -1/sqrt(2) + 0 + 1/sqrt(2) + 2/sqrt(2) and the terms all cancel out to equal 0.
Having all positive numbers in a list does not imply that all the standardized values are positive. I have shown you a case where all the numbers are positive, but some of the standardized values are negative, one is 0, and the rest are positive.
I hope this helps clear things up.
I would go through the process of calculating the mean and the SD of 4,6,16,20,25, subtracting the mean from each number, and dividing each number by SD. Then, sum them up. You will get 0.
