Since Sql Server doesn't have an aggregate for Median, I figured, that I'll create my first SQL CLR to handle this problem.
So, I created my intial project, using the .Net Framework 3.5 and C# SQL CLR aggregate template, and started coding:
Some of the intial changes that I've made from the default:
1) The SqlUserDefinedAggregate default was Format.Native. The problem with using the default Serializer is that I needed a way to store my values, preferably in a List. By changing the format to Format.UserDefined -- I was able to create my own serializer. In this case I used the IBinarySerialize interface.
2) I also changed the default IsInvariant attributes.
Once the code is compiled, you will need to place the dll on the same machine as the SQL Server is located.
Using the following query to set the assembly and aggregate name:
Once done, time to do some testing. An example:
So, I created my intial project, using the .Net Framework 3.5 and C# SQL CLR aggregate template, and started coding:
Code Snippet
using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Collections.Generic;
[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedAggregate(
Format.UserDefined,
IsInvariantToDuplicates = false,
IsInvariantToNulls = false,
IsInvariantToOrder = false,
MaxByteSize = 8000)]
public struct Median : IBinarySerialize
{
//Variables to hold the values;
private List<double> ld;
public void Init()
{
ld = new List<double>();
}
public void Accumulate(SqlDouble Value)
{
if (!Value.IsNull)
{
ld.Add(Value.Value);
}
}
///
/// Merge the partially computed aggregate with this aggregate.
///
/// The other partial results to be merged
public void Merge(Median Group)
{
this.ld.AddRange(Group.ld.ToArray());
}
///
/// Called at the end of aggregation, to return the results.
///
/// The median of all inputted values
public SqlDouble Terminate()
{
//debug: return (SqlDouble)ld.Count;
//special case 0 values
if (ld.Count == 0)
return SqlDouble.Null;
ld.Sort();
int index = (int)ld.Count / 2;
if (ld.Count % 2 == 0)
{
return (SqlDouble)(((double)ld[index] + (double)ld[index - 1]) / 2);
}
else
{
return (SqlDouble)((double)ld[index]);
}
}
#region IBinarySerialize Members
public void Read(System.IO.BinaryReader r)
{
int cnt = r.ReadInt32();
this.ld = new List<double>(cnt);
for(int i = 0; i < cnt; i++)
{
this.ld.Add(r.ReadDouble());
}
}
public void Write(System.IO.BinaryWriter w)
{
w.Write(this.ld.Count);
foreach (double d in this.ld)
{
w.Write(d);
}
}
#endregion
}
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Collections.Generic;
[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedAggregate(
Format.UserDefined,
IsInvariantToDuplicates = false,
IsInvariantToNulls = false,
IsInvariantToOrder = false,
MaxByteSize = 8000)]
public struct Median : IBinarySerialize
{
//Variables to hold the values;
private List<double> ld;
public void Init()
{
ld = new List<double>();
}
public void Accumulate(SqlDouble Value)
{
if (!Value.IsNull)
{
ld.Add(Value.Value);
}
}
///
/// Merge the partially computed aggregate with this aggregate.
///
/// The other partial results to be merged
public void Merge(Median Group)
{
this.ld.AddRange(Group.ld.ToArray());
}
///
/// Called at the end of aggregation, to return the results.
///
///
public SqlDouble Terminate()
{
//debug: return (SqlDouble)ld.Count;
//special case 0 values
if (ld.Count == 0)
return SqlDouble.Null;
ld.Sort();
int index = (int)ld.Count / 2;
if (ld.Count % 2 == 0)
{
return (SqlDouble)(((double)ld[index] + (double)ld[index - 1]) / 2);
}
else
{
return (SqlDouble)((double)ld[index]);
}
}
#region IBinarySerialize Members
public void Read(System.IO.BinaryReader r)
{
int cnt = r.ReadInt32();
this.ld = new List<double>(cnt);
for(int i = 0; i < cnt; i++)
{
this.ld.Add(r.ReadDouble());
}
}
public void Write(System.IO.BinaryWriter w)
{
w.Write(this.ld.Count);
foreach (double d in this.ld)
{
w.Write(d);
}
}
#endregion
}
Some of the intial changes that I've made from the default:
1) The SqlUserDefinedAggregate default was Format.Native. The problem with using the default Serializer is that I needed a way to store my values, preferably in a List. By changing the format to Format.UserDefined -- I was able to create my own serializer. In this case I used the IBinarySerialize interface.
2) I also changed the default IsInvariant attributes.
- IsInvariantToDuplicates - set to false since I want duplicates.
- IsInvariantToNull - set to false since I will handle Null values in my code.
- IsInvariantToOrder - set to false, I can do this in code.
Once the code is compiled, you will need to place the dll on the same machine as the SQL Server is located.
Using the following query to set the assembly and aggregate name:
Code Snippet
IF EXISTS (SELECT name FROM sysobjects WHERE name = 'Median')
DROP AGGREGATE Median
go
IF EXISTS (SELECT name FROM sys.assemblies WHERE name = 'MyClrCode')
DROP ASSEMBLY MyClrCode
go
CREATE ASSEMBLY MyClrCode
FROM 'C:\Median.dll'
WITH PERMISSION_SET = SAFE
GO
CREATE AGGREGATE dbo.Median
(@input float)
RETURNS float
EXTERNAL NAME MyClrCode.Median
DROP AGGREGATE Median
go
IF EXISTS (SELECT name FROM sys.assemblies WHERE name = 'MyClrCode')
DROP ASSEMBLY MyClrCode
go
CREATE ASSEMBLY MyClrCode
FROM 'C:\Median.dll'
WITH PERMISSION_SET = SAFE
GO
CREATE AGGREGATE dbo.Median
(@input float)
RETURNS float
EXTERNAL NAME MyClrCode.Median
Once done, time to do some testing. An example:
Code Snippet
--Test 1 value
select
dbo.Median(x.y)
from
(
select 1 as y
)x
GO
--Test Accuracy
select
dbo.Median(x.y)
from
(
select 1 as y
union all
select 2 as y
)x
GO
select
dbo.Median(x.y)
from
(
select 1 as y
)x
GO
--Test Accuracy
select
dbo.Median(x.y)
from
(
select 1 as y
union all
select 2 as y
)x
GO
11 comments:
Tanks for solution but if 2 user call the median function at same time the median function throw System.NullReferenceException.
For reproduce the bug :
select
(select dbo.median(field1) from table),(select dbo.median(field2) from table)
I don't have that problem. I did up to 6 dbo.Median calls at the same time before stopping. I did notice that I still have a buffer issue.
I originally had the attribute set at MaxByteSize = 8000, then I changed it to MaxByteSize = -1 which worked for more. I think anything over 1000+ seems to give me a buffer error. (Assuming I deployed my changes to my dev environment)
I believe you're misunderstanding the invariant flags. IsInvariantToOrder means that it doesnt matter what order the incoming data is in, it will be yield the same result. This allows the optimizer to ignore any order by statement within that aggregate.
A concatenate aggregate would NOT be order invariant, but a sum, avg, and your implementation of median ARE order invariant.
Thanks for the post!
I see, the Invariants are used by sql-server query processor to determine the most optimized query execution plan/path.
My execution, doesn't care about the order, so setting it to true -- might actually make it run quicker.
I need to implement a clr aggregate median in a project that I am working on, but it doesn't say anything about the legal use of code on your site. Can you send me any licensing restrictions you have on the code that you post on the blog?
It's all open and free to use. Nothing propriety here.
If need be, you can grab the code also here: https://github.com/andrusw/SQLCLR-Library/blob/master/SQLCLR/Math/Median/Median.cs
Just to clarify it is a closed source / for profit project. Still ok? Thanks for getting back to me so quick.
Jason,
It's okay.
Hey andrus,
great post. Helped me a lot. Thanks a ton.
Post a Comment