Data analysis foundation Quiz #2: I think we can compare mean and median without precisely calculating the mean

The box plot represents a survey of the ages of students in a Master’s program. We need to compare the median age (Quantity A) to the average age (Quantity B).

if the distribution is skewed (e.g., if there are more older students pulling the average higher), the average could be higher than the median.
The data is somewhat spread out, with a slight indication that the distribution might be skewed right, as the right whisker is longer than the left. This suggests that there might be a few older students, which could pull the average above the median.

This makes, Quantity B (average age) is likely greater than Quantity A (median age) and the correct option could be B. Can someone comment on this on where am i wrong? @gregmat @Leaderboard

Look at the definition of a boxplot itself. What can you assume and what can you not?

but here we can infer that the distribution is skewed and majority of data points lie between 27 and 32 making average to be skewed towards right. i do get the point that we cant know the mean precisely but the dstribution makes tha average to higher from the median. What am i getting wrong here? @Leaderboard

That is not a reasonable assumption - you need to be very careful on what you can assume and what you cannot. In fact, you cannot even attribute this to a specific distribution.

Also, please don’t ping people in general.

the provided info might be incorrect or correct, but, can you please provide evidence or first principle thinking on why one can’t?

For one, the boxplot gives little information on density. So it could well be the fact that despite the boxplot looking like the mean > median, we could for instance concentrate just under a quarter of the values approximately at Q_1. This will reduce the mean.

I guess that’s the beauty of quartiles right, we can’t, right?

The count or density has to be capped to the 50th percentile not more than that.

The only thing that worried me is the repetition of same numbers but still that’s been taken care by the size of whiskers.

I might be wrong, pretty novice in these things.

Correct - but the values in between the quartiles can be freely manipulated.