Collation
From SQLServerPedia
|
See Also: Main_Page - Transact SQL Language Elements Being a product widely used in many countries for different purposes, SQL Server supports a variety of languages and sorting options. Each letter, digit and a special character is represented in a computer by a different combination of bits. The collection of rules specifying the bit representations of each character in a particular language (or set of languages) is referred to as a code page. Choosing the appropriate sort order will determine whether you can store particular characters within your databases. The following table lists the code pages supported by SQL Server 2000 (adopted from SQL Server BOL):
The sort order defines how the characters are compared with each other and affects the performance of many T-SQL Statements, such as ORDER BY, WHERE, HAVING, DISTINCT, etc. For instance the sort order determines whether the letter "l " comes before or after the letter "m" and whether "m" is the same as "M ". With appropriate languages the sort order also determines whether sorting within SQL Server is supposed to be accent sensitive and kana sensitive. Starting with version 7.0 SQL Server supports Unicode data types. If data is stored in Unicode format there is no need to convert the characters of one language into another; however, Unicode data takes up twice as much storage space as non-unicode data. SQL Server 2000 provides a way to manage Unicode and non-unicode data with a single collation. SQL Server 2000 uses two sort orders with each collation, one for Unicode and another for non-Unicode data. The most commonly used sort order is referred to as dictionary order, case insensitive, accent insensitive. This sort order is used with the character set 1252 and orders characters the way they appear in the alphabet, without making a distinction between upper and lowercase or accent. Another common sort order is Binary, which is used with the character set 437 (US English). The Binary sort order arranges data according to the bit patterns that represent each character. Therefore, the output with the Binary sort might not be similar to what you get with the dictionary order. A collation is a set of rules that govern the bit representation of the characters in the computer, as well as how these characters need to be sorted. Previous versions of SQL Server had separate settings for sort order, and character set to be used. You could not change the sort order or character set without reinstalling the SQL Server. Even worse, databases backed up on a server with one sort order could not be restored on a server with a different sort order. In SQL Server 2000, one setting governs the default collation (code page, as well as sort order) for the entire instance of the server you're using. However, you can specify a different collation at any level. Therefore, you could use two different collations within the same table. Now collation setting determines the sort order to be used with Unicode data types, non-Unicode data types as well as the code page to be used for non-Unicode characters. Selecting the proper collation settings is important for several reasons. First, it will affect how your T-SQL programs behave. For instance SELECT * FROM authors WHERE au_lname = 'Green' will return different results on case sensitive and case-insensitive servers. Besides, changing the collation after SQL Server is set up is not a trivial task. You no longer have to reinstall the software (as was the case with previous versions of SQL Server) to change the default collation, but you do have to do the following:
The collation settings will also cause problems when exchanging data among servers having different collations. It is recommended to have all SQL Servers within an organization running with the same collation settings. Executing sp_helpsort system stored procedure will return the character set and sort order used by the current instance of SQL Server. There are a multitude of collation settings supported by SQL Server 2000. Please refer to the vendor documentation for a full list of supported collations. See Also: Main_Page - Transact SQL Language Elements Being a product widely used in many countries for different purposes, SQL Server supports a variety of languages and sorting options. Each letter, digit and a special character is represented in a computer by a different combination of bits. The collection of rules specifying the bit representations of each character in a particular language (or set of languages) is referred to as a code page. Choosing the appropriate sort order will determine whether you can store particular characters within your databases. The following table lists the code pages supported by SQL Server 2000 (adopted from SQL Server BOL):
The sort order defines how the characters are compared with each other and affects the performance of many T-SQL Statements, such as ORDER BY, WHERE, HAVING, DISTINCT, etc. For instance the sort order determines whether the letter "l " comes before or after the letter "m" and whether "m" is the same as "M ". With appropriate languages the sort order also determines whether sorting within SQL Server is supposed to be accent sensitive and kana sensitive. Starting with version 7.0 SQL Server supports Unicode data types. If data is stored in Unicode format there is no need to convert the characters of one language into another; however, Unicode data takes up twice as much storage space as non-unicode data. SQL Server 2000 provides a way to manage Unicode and non-unicode data with a single collation. SQL Server 2000 uses two sort orders with each collation, one for Unicode and another for non-Unicode data. The most commonly used sort order is referred to as dictionary order, case insensitive, accent insensitive. This sort order is used with the character set 1252 and orders characters the way they appear in the alphabet, without making a distinction between upper and lowercase or accent. Another common sort order is Binary, which is used with the character set 437 (US English). The Binary sort order arranges data according to the bit patterns that represent each character. Therefore, the output with the Binary sort might not be similar to what you get with the dictionary order. A collation is a set of rules that govern the bit representation of the characters in the computer, as well as how these characters need to be sorted. Previous versions of SQL Server had separate settings for sort order, and character set to be used. You could not change the sort order or character set without reinstalling the SQL Server. Even worse, databases backed up on a server with one sort order could not be restored on a server with a different sort order. In SQL Server 2000, one setting governs the default collation (code page, as well as sort order) for the entire instance of the server you're using. However, you can specify a different collation at any level. Therefore, you could use two different collations within the same table. Now collation setting determines the sort order to be used with Unicode data types, non-Unicode data types as well as the code page to be used for non-Unicode characters. Selecting the proper collation settings is important for several reasons. First, it will affect how your T-SQL programs behave. For instance SELECT * FROM authors WHERE au_lname = 'Green' will return different results on case sensitive and case-insensitive servers. Besides, changing the collation after SQL Server is set up is not a trivial task. You no longer have to reinstall the software (as was the case with previous versions of SQL Server) to change the default collation, but you do have to do the following:
The collation settings will also cause problems when exchanging data among servers having different collations. It is recommended to have all SQL Servers within an organization running with the same collation settings. Executing sp_helpsort system stored procedure will return the character set and sort order used by the current instance of SQL Server. There are a multitude of collation settings supported by SQL Server 2000. Please refer to the vendor documentation for a full list of supported collations. |