About Me

My photo
Northglenn, Colorado, United States
I'm primarily a BI Developer on the Microsoft stack. I do sometimes touch upon other Microsoft stacks ( web development, application development, and sql server development).

Thursday, September 22, 2011

Median in SQL

Median is not a default aggregate in SQL-Server, but is sometime a perferable statistical function than Average. So here is quick tip on how I got the median:

Define: Median -
"The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one. If there is an even number of observations, then there is no single middle value; the median is then usually defined to be the mean of the two middle values" - Wikipedia http://en.wikipedia.org/wiki/Median
By using ROW_NUMBER() function twice I can get my approximation of where the median is located.

For Example:
ROW_NUMBER() OVER (PARTITION BY Date, Array, Inverter ORDER BY Current ASC

-and-

ROW_NUMBER() OVER (PARTITION BY Date, Array, Inverter ORDER BY Current DESC

By using these two values I can pull the absolute value from their subtraction

ABS(ASCRow - DescRow)

To get an approximation of where it is associated with the median. Now if the values you are sorting are distinct, then you can find the median by looking for numbers that are less than or equal to 1 and then taking their average.

ABS(ASCRow - DescRow) <=1

In my case I can get more than a count of 2 numbers that are at the median, in this case I did another ROW_NUMBER function but this time order on the absolute difference and then I selected the top row in an outer query:

ROW_NUMBER() OVER (PARTITION BY Date, Array, Inverter ORDER BY MedianDistance) as ROW_NUM


This is where I stopped, even though there is still the chance of not getting the "true" Median. I would basically need to go further and pull all data points with the same value as the MedianDistance for those partitions and do an average.