迁移 Oracle 到 PostgreSQL: unicode 字符类型 - nchar, nvarchar, ntext

五月 17, 2023

在Oracle, SQL Server, Sybase 数据库引擎中，有一些这样的字符串类型 nchar, nvarchar, ntext。

这些类型是什么含义？在 PostgreSQL 中应该使用什么类型与之对应？

以 SQL Server 中的 nchar 和 nvarchar (Transact-SQL) 为例，字符数据类型可以是固定大小的 nchar，或可变大小的 nvarchar。它们以 Unicode 或者 UCS-2 字符集的格式存储。

nchar [ ( n ) ]

固定大小的字符串数据。n 以字节对为单位定义字符串长度，并且必须是 1 到 4000 之间的值。存储大小是 n 字节的两倍。

nvarchar [ ( n | max ) ]

可变大小的字符串数据。n 以字节对为单位定义字符串长度，可以是 1 到 4，000 之间的值。max 表示最大存储大小为 2^31-1 个字符（2 GB）。存储大小是 n 字节 + 2 字节的两倍。

实际上，就两层含义：

1、存储的是 unicode 编码的字符串，使用 Unicode 或者 UCS-2 字符集。

2、长度指的是字符个数，而非字节数。

nchar , nvarchar , ntext 在 PostgreSQL 的对应关系

基于以上介绍的两点，只要满足以下条件，PostgreSQL 的字符类型 char, varchar, text 即对应到了 nchar , nvarchar , ntext类型。

在 PostgreSQL 中使用 UTF8 字符集时，实际上就是 unicode（别名）。详细内容，可参见字符集支持。

名称	描述	语言	服务端？	ICU?	字节/字符	别名
UTF8	Unicode, 8-bit	所有	是	是	1-4	Unicode

满足以上条件即可用 char, varchar, text 直接替代 nchar, nvarchar, ntext，因为在 PostgreSQL 中 char(n), varchar(n)，任何时候就是指的字符长度（而不是字节长度）。

如果 PostgreSQL 的数据库编码不是 utf8，怎么办？

可以有两种方法：

1、实际上依旧可以使用 char, varchar, text 类型存储(长度限制与上游保持一致即可)，只是建议业务方做一下字符集转换后存入 PostgreSQL。

源端存储的 unicode 字符串转换为 PostgreSQL 目标库的目标字符集字符串。

postgres=# \df convert*
                             List of functions  
   Schema   |     Name     | Result data type | Argument data types | Type   
------------+--------------+------------------+---------------------+------  
 pg_catalog | convert      | bytea            | bytea, name, name   | func  
 pg_catalog | convert_from | text             | bytea, name         | func  
 pg_catalog | convert_to   | bytea            | text, name          | func  
(3 rows)  

dbtest1=# select convert_to(N'你好中国','sqlascii');  
     convert_to       
--------------------  
 \xc4e3bac3d6d0b9fa  
(1 row)  

dbtest1=# select convert_to(N'你好中国','utf8');  
         convert_to           
----------------------------  
 \xe4bda0e5a5bde4b8ade59bbd  
(1 row)

2、或者你在 PostgreSQL 中可以使用字节流存储来自源库 unicode 字符串的字节流，读取时再转换为当前字符集。

dbtest1=# \l  
                                 List of databases  
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges     
-----------+----------+----------+------------+------------+-----------------------  
 dbtest1   | postgres | EUC_CN   | C          | C          |   
  
  
create table test (id int, content bytea);  
  
insert into test values (1, convert_to(N'你好中国','utf8'));  
  
dbtest1=# select * from test;  
 id |          content             
----+----------------------------  
  1 | \xe4bda0e5a5bde4b8ade59bbd  
(1 row)  
  
dbtest1=# select convert_from(content,'utf8') from test;  
 convert_from   
--------------  
 你好中国  
(1 row)

PostgreSQL 中 unicode 字符类型的输入语法

N quote

dbtest1=# select N'abc你好中国';  
   bpchar      
-------------  
 abc你好中国  
(1 row)

nchar , nvarchar , ntext 在 PostgreSQL 的对应关系

如果 PostgreSQL 的数据库编码不是 utf8，怎么办？

PostgreSQL 中 unicode 字符类型的输入语法

搜索

分类

标签