About Me

My photo
Northglenn, Colorado, United States
I'm primarily a BI Developer on the Microsoft stack. I do sometimes touch upon other Microsoft stacks ( web development, application development, and sql server development).

Tuesday, October 30, 2007

String Sort vs. Word Sort in .Net

An interesting set of articles that Michael Kaplan has been discussing on why it might seem that there is a bug in the compare functions of a string.

In the two sorting styles these would be equal:
String Sort: 'co-op' vs. 'co_op'
Word Sort: 'co-op' vs. 'coop'

In the string sort the minus and underscore are treated as symbols which are given the same "weight".

In the word sort the minus and underscore are treated as "special" symbols with differ weights.



//
// Sorting Flags.
//
// WORD Sort: culturally correct sort
// hyphen and apostrophe are special cased
// example: "coop" and "co-op" will sort together in a list
//
// co_op <------- underscore (symbol)
// coat
// comb
// coop
// co-op <------- hyphen (punctuation)
// cork
// went
// were
// we're <------- apostrophe (punctuation)
//
//
// STRING Sort: hyphen and apostrophe will sort with all other symbols
//
// co-op <------- hyphen (punctuation)
// co_op <------- underscore (symbol)
// coat
// comb
// coop
// cork
// we're <------- apostrophe (punctuation)
// went
// were
//



Now there is a problem that occurs when using a hypen (‐) U+2010 vs. hypen-minus U+002d (-) in string comparisons. The hypen-minus is treated as a minus, which could cause confusion.

So be careful when using the String.Compare function.
Interesting Note: StringCompare is used in SQL-Server for sort keys.



Source: http://blogs.msdn.com/michkap/archive/2007/09/20/5008305.aspx

So basically use the default for a lingusitc sort, but use CompareOptions to do a ordinal sort.

Source: http://msdn2.microsoft.com/en-us/library/system.globalization.compareoptions.aspx

No comments: