使用 JDBC driver 查询 INFORMATION_SCHEMA.COLUMNS 的优化之路
作者: du 拉松原文来源:https://tidb.net/blog/d525876c
一、背景
我们在业务中,有时需要获取数据库中表结构的信息,比如字段的名称、类型、类型长度和说明等信息。在 mysql 的 driver 中有现成的方法我们可以调用,但是在使用该方法时遇到了一些问题,这个做一下记录。
在业务库中存在很多表,造成很多的字段信息,大概 30 万左右,这样仅仅 select count(*) from INFORMATION_SCHEMA.COLUMNS where TABLE_SCHEMA = ‘table_name’ 就耗费很长时间,大概 9s 多。
但是在使用 DatabaseMetaData.getColumns(catalog, schema, tableName, null) 查询某个表的字段信息时还是很慢,也差不多 9s 多。
涉及到的环境:
tidb:v8.5.1
mysql 驱动包:8.0.33
二、排查流程
查看源码,看看最终执行的 sql 是什么样的,我们最终发现在 com.mysql.cj.jdbc.DatabaseMetaDataUsingInfoSchema#getColumns 中定义了相关 sql.
打断点,看看最终执行的 sql 是什么样的呢,最终发现执行的 sql 如下:
分析上面的语句,尽管查询 sql 很长,但是他只查询了 INFORMATION_SCHEMA.COLUMNS 表,那为什么仅查询一个表的字段还那么慢?
我们把上述语句中的”AND TABLE_NAME LIKE ?“ 改为”AND TABLE_NAME = ?“放在 tidb 中执行发现仅 0.2 秒。
那么为什么 tidb 对 like 的执行那么慢,而且该表没有索引什么的?为什么查询单表时 mysql 驱动生成的语句是 like 而不是等于?
三、原因分析
为什么 tidb 对 INFORMATION_SCHEMA.COLUMNS 中的 TABLE_NAME 执行 like 查询那么慢?
首先确认该表是系统表,也不存在索引,但是我为啥出现那么大的差距呢?
chatgpt 给出的解释如下:
嗯,说的有道理,但是没找到官方说明(这个得吐槽一下),就把他当原因吧。
上社区看下吧,看到好多陷入到该坑的:
https://asktug.com/t/topic/1028335
https://asktug.com/t/topic/1027258
为什么查询单表时 mysql 驱动生成的语句是 like 而不是等于
我在执行 DatabaseMetaData.getColumns(catalog, schema, tableName, null);方法时指定的 tableName 是一个具体的表名,但是带了下划线。即 table_name 这种,那只能看看代码,发现代码处理如下:
发现这里对表名做了一下判断,如果表名中包含 % 或者 _ 时就会使用 like 的查询模式。
嗯 …..
这算不算是个 bug?如果我在库中同时存在 table_name 和 tableaname,那么我指定查询 table_name 表字段时,会不会把这两个表的字段都查询出来呢?
四、解决方法
数据库的问题解决不了,那咱们就只能从应用端入手了。修改 jdbc 的源码解决”bug“, 这个可以解决,但是还得对源码打包再依赖,比较复杂。那么咱们可以写个子类重写上面的方法,嗯这个不错,开始干:
定义一个 MyDatabaseMetaData 类,来继承 DatabaseMetaDataUsingInfoSchema,主要是修改内容为:当表名包含下划线时使用等于查询。
具体如下:
package com.df.test.config;
import com.mysql.cj.MysqlType;import com.mysql.cj.ServerVersion;import com.mysql.cj.conf.PropertyDefinitions;import com.mysql.cj.jdbc.DatabaseMetaDataUsingInfoSchema;import com.mysql.cj.jdbc.JdbcConnection;import com.mysql.cj.jdbc.result.ResultSetFactory;import com.mysql.cj.util.StringUtils;
import java.sql.ResultSet;import java.sql.SQLException;import java.sql.Types;
/** * @Author: lichong * @Description: * @Date: 2025/9/16 10:44 * @Version: 1.0 */public class MyDatabaseMetaData extends DatabaseMetaDataUsingInfoSchema {
protected MyDatabaseMetaData(JdbcConnection connToSet, String databaseToSet, ResultSetFactory resultSetFactory) throws SQLException { super(connToSet, databaseToSet, resultSetFactory); }
public ResultSet getColumns(String catalog, String schemaPattern, String tableName, String columnNamePattern) throws SQLException { String db = getDatabase(catalog, schemaPattern);
db = this.pedantic ? db : StringUtils.unQuoteIdentifier(db, this.quotedId);
StringBuilder sqlBuf = new StringBuilder( this.databaseTerm.getValue() == PropertyDefinitions.DatabaseTerm.SCHEMA ? "SELECT TABLE_CATALOG, TABLE_SCHEMA," : "SELECT TABLE_SCHEMA, NULL,"); sqlBuf.append(" TABLE_NAME, COLUMN_NAME,");
appendJdbcTypeMappingQuery(sqlBuf, "DATA_TYPE", "COLUMN_TYPE"); sqlBuf.append(" AS DATA_TYPE, ");
sqlBuf.append("UPPER(CASE"); if (this.tinyInt1isBit) { sqlBuf.append(" WHEN UPPER(DATA_TYPE)='TINYINT' THEN CASE"); sqlBuf.append( " WHEN LOCATE('ZEROFILL', UPPER(COLUMN_TYPE)) = 0 AND LOCATE('UNSIGNED', UPPER(COLUMN_TYPE)) = 0 AND LOCATE('(1)', COLUMN_TYPE) != 0 THEN "); sqlBuf.append(this.transformedBitIsBoolean ? "'BOOLEAN'" : "'BIT'"); sqlBuf.append(" WHEN LOCATE('UNSIGNED', UPPER(COLUMN_TYPE)) != 0 AND LOCATE('UNSIGNED', UPPER(DATA_TYPE)) = 0 THEN 'TINYINT UNSIGNED'"); sqlBuf.append(" ELSE DATA_TYPE END "); } sqlBuf.append( " WHEN LOCATE('UNSIGNED', UPPER(COLUMN_TYPE)) != 0 AND LOCATE('UNSIGNED', UPPER(DATA_TYPE)) = 0 AND LOCATE('SET', UPPER(DATA_TYPE)) <> 1 AND LOCATE('ENUM', UPPER(DATA_TYPE)) <> 1 THEN CONCAT(DATA_TYPE, ' UNSIGNED')");
// spatial data types sqlBuf.append(" WHEN UPPER(DATA_TYPE)='POINT' THEN 'GEOMETRY'"); sqlBuf.append(" WHEN UPPER(DATA_TYPE)='LINESTRING' THEN 'GEOMETRY'"); sqlBuf.append(" WHEN UPPER(DATA_TYPE)='POLYGON' THEN 'GEOMETRY'"); sqlBuf.append(" WHEN UPPER(DATA_TYPE)='MULTIPOINT' THEN 'GEOMETRY'"); sqlBuf.append(" WHEN UPPER(DATA_TYPE)='MULTILINESTRING' THEN 'GEOMETRY'"); sqlBuf.append(" WHEN UPPER(DATA_TYPE)='MULTIPOLYGON' THEN 'GEOMETRY'"); sqlBuf.append(" WHEN UPPER(DATA_TYPE)='GEOMETRYCOLLECTION' THEN 'GEOMETRY'"); sqlBuf.append(" WHEN UPPER(DATA_TYPE)='GEOMCOLLECTION' THEN 'GEOMETRY'");
sqlBuf.append(" ELSE UPPER(DATA_TYPE) END) AS TYPE_NAME,");
sqlBuf.append("UPPER(CASE"); sqlBuf.append(" WHEN UPPER(DATA_TYPE)='DATE' THEN 10"); // supported range is '1000-01-01' to '9999-12-31' if (this.conn.getServerVersion().meetsMinimum(ServerVersion.parseVersion("5.6.4"))) { sqlBuf.append(" WHEN UPPER(DATA_TYPE)='TIME'"); // supported range is '-838:59:59.000000' to '838:59:59.000000' sqlBuf.append(" THEN 8+(CASE WHEN DATETIME_PRECISION>0 THEN DATETIME_PRECISION+1 ELSE DATETIME_PRECISION END)"); sqlBuf.append(" WHEN UPPER(DATA_TYPE)='DATETIME' OR"); // supported range is '1000-01-01 00:00:00.000000' to '9999-12-31 23:59:59.999999' sqlBuf.append(" UPPER(DATA_TYPE)='TIMESTAMP'"); // supported range is '1970-01-01 00:00:01.000000' UTC to '2038-01-19 03:14:07.999999' UTC sqlBuf.append(" THEN 19+(CASE WHEN DATETIME_PRECISION>0 THEN DATETIME_PRECISION+1 ELSE DATETIME_PRECISION END)"); } else { sqlBuf.append(" WHEN UPPER(DATA_TYPE)='TIME' THEN 8"); // supported range is '-838:59:59.000000' to '838:59:59.000000' sqlBuf.append(" WHEN UPPER(DATA_TYPE)='DATETIME' OR"); // supported range is '1000-01-01 00:00:00.000000' to '9999-12-31 23:59:59.999999' sqlBuf.append(" UPPER(DATA_TYPE)='TIMESTAMP'"); // supported range is '1970-01-01 00:00:01.000000' UTC to '2038-01-19 03:14:07.999999' UTC sqlBuf.append(" THEN 19"); }
sqlBuf.append(" WHEN UPPER(DATA_TYPE)='YEAR' THEN 4"); if (this.tinyInt1isBit && !this.transformedBitIsBoolean) { sqlBuf.append( " WHEN UPPER(DATA_TYPE)='TINYINT' AND LOCATE('ZEROFILL', UPPER(COLUMN_TYPE)) = 0 AND LOCATE('UNSIGNED', UPPER(COLUMN_TYPE)) = 0 AND LOCATE('(1)', COLUMN_TYPE) != 0 THEN 1"); } // workaround for Bug#69042 (16712664), "MEDIUMINT PRECISION/TYPE INCORRECT IN INFORMATION_SCHEMA.COLUMNS", I_S bug returns NUMERIC_PRECISION=7 for MEDIUMINT UNSIGNED when it must be 8. sqlBuf.append(" WHEN UPPER(DATA_TYPE)='MEDIUMINT' AND LOCATE('UNSIGNED', UPPER(COLUMN_TYPE)) != 0 THEN 8"); sqlBuf.append(" WHEN UPPER(DATA_TYPE)='JSON' THEN 1073741824"); // JSON columns is limited to the value of the max_allowed_packet system variable (max value 1073741824)
// spatial data types sqlBuf.append(" WHEN UPPER(DATA_TYPE)='GEOMETRY' THEN 65535"); sqlBuf.append(" WHEN UPPER(DATA_TYPE)='POINT' THEN 65535"); sqlBuf.append(" WHEN UPPER(DATA_TYPE)='LINESTRING' THEN 65535"); sqlBuf.append(" WHEN UPPER(DATA_TYPE)='POLYGON' THEN 65535"); sqlBuf.append(" WHEN UPPER(DATA_TYPE)='MULTIPOINT' THEN 65535"); sqlBuf.append(" WHEN UPPER(DATA_TYPE)='MULTILINESTRING' THEN 65535"); sqlBuf.append(" WHEN UPPER(DATA_TYPE)='MULTIPOLYGON' THEN 65535"); sqlBuf.append(" WHEN UPPER(DATA_TYPE)='GEOMETRYCOLLECTION' THEN 65535"); sqlBuf.append(" WHEN UPPER(DATA_TYPE)='GEOMCOLLECTION' THEN 65535");
sqlBuf.append(" WHEN CHARACTER_MAXIMUM_LENGTH IS NULL THEN NUMERIC_PRECISION"); sqlBuf.append(" WHEN CHARACTER_MAXIMUM_LENGTH > "); sqlBuf.append(Integer.MAX_VALUE); sqlBuf.append(" THEN "); sqlBuf.append(Integer.MAX_VALUE); sqlBuf.append(" ELSE CHARACTER_MAXIMUM_LENGTH"); sqlBuf.append(" END) AS COLUMN_SIZE,");
sqlBuf.append(maxBufferSize); sqlBuf.append(" AS BUFFER_LENGTH,");
sqlBuf.append("UPPER(CASE"); sqlBuf.append(" WHEN UPPER(DATA_TYPE)='DECIMAL' THEN NUMERIC_SCALE"); sqlBuf.append(" WHEN UPPER(DATA_TYPE)='FLOAT' OR UPPER(DATA_TYPE)='DOUBLE' THEN"); sqlBuf.append(" CASE WHEN NUMERIC_SCALE IS NULL THEN 0"); sqlBuf.append(" ELSE NUMERIC_SCALE END"); sqlBuf.append(" ELSE NULL END) AS DECIMAL_DIGITS,");
sqlBuf.append("10 AS NUM_PREC_RADIX,");
sqlBuf.append("CASE"); sqlBuf.append(" WHEN IS_NULLABLE='NO' THEN "); sqlBuf.append(columnNoNulls); sqlBuf.append(" ELSE CASE WHEN IS_NULLABLE='YES' THEN "); sqlBuf.append(columnNullable); sqlBuf.append(" ELSE "); sqlBuf.append(columnNullableUnknown); sqlBuf.append(" END END AS NULLABLE,");
sqlBuf.append("COLUMN_COMMENT AS REMARKS,"); sqlBuf.append("COLUMN_DEFAULT AS COLUMN_DEF,"); sqlBuf.append("0 AS SQL_DATA_TYPE,"); sqlBuf.append("0 AS SQL_DATETIME_SUB,");
sqlBuf.append("CASE WHEN CHARACTER_OCTET_LENGTH > "); sqlBuf.append(Integer.MAX_VALUE); sqlBuf.append(" THEN "); sqlBuf.append(Integer.MAX_VALUE); sqlBuf.append(" ELSE CHARACTER_OCTET_LENGTH END AS CHAR_OCTET_LENGTH,");
sqlBuf.append("ORDINAL_POSITION, IS_NULLABLE, NULL AS SCOPE_CATALOG, NULL AS SCOPE_SCHEMA, NULL AS SCOPE_TABLE, NULL AS SOURCE_DATA_TYPE,"); sqlBuf.append("IF (EXTRA LIKE '%auto_increment%','YES','NO') AS IS_AUTOINCREMENT, "); sqlBuf.append("IF (EXTRA LIKE '%GENERATED%','YES','NO') AS IS_GENERATEDCOLUMN ");
sqlBuf.append("FROM INFORMATION_SCHEMA.COLUMNS");
StringBuilder conditionBuf = new StringBuilder();
if (db != null) { conditionBuf.append("information_schema".equalsIgnoreCase(db) || "performance_schema".equalsIgnoreCase(db) || !hasLike(db) || this.databaseTerm.getValue() == PropertyDefinitions.DatabaseTerm.CATALOG ? " TABLE_SCHEMA = ?" : " TABLE_SCHEMA LIKE ?"); } if (tableName != null) { if (conditionBuf.length() > 0) { conditionBuf.append(" AND"); } conditionBuf.append(hasLike(tableName) ? " TABLE_NAME LIKE ?" : " TABLE_NAME = ?"); } if (columnNamePattern != null) { if (conditionBuf.length() > 0) { conditionBuf.append(" AND"); } conditionBuf.append(hasLike(columnNamePattern) ? " COLUMN_NAME LIKE ?" : " COLUMN_NAME = ?"); }
if (conditionBuf.length() > 0) { sqlBuf.append(" WHERE"); } sqlBuf.append(conditionBuf); sqlBuf.append(" ORDER BY TABLE_SCHEMA, TABLE_NAME, ORDINAL_POSITION");
java.sql.PreparedStatement pStmt = null;
try { pStmt = prepareMetaDataSafeStatement(sqlBuf.toString());
int nextId = 1; if (db != null) { pStmt.setString(nextId++, db); } if (tableName != null) { pStmt.setString(nextId++, tableName); } if (columnNamePattern != null) { pStmt.setString(nextId, columnNamePattern); }
ResultSet rs = executeMetadataQuery(pStmt);
((com.mysql.cj.jdbc.result.ResultSetInternalMethods) rs).getColumnDefinition().setFields(createColumnsFields()); return rs; } finally { if (pStmt != null) { pStmt.close(); } } }
private boolean hasLike(String name) { return StringUtils.indexOfIgnoreCase(0, name, "%") > -1; }
private final void appendJdbcTypeMappingQuery(StringBuilder buf, String mysqlTypeColumnName, String fullMysqlTypeColumnName) {
buf.append("CASE "); for (MysqlType mysqlType : MysqlType.values()) {
buf.append(" WHEN UPPER("); buf.append(mysqlTypeColumnName); buf.append(")='"); buf.append(mysqlType.getName()); buf.append("' THEN ");
switch (mysqlType) { case TINYINT: case TINYINT_UNSIGNED: if (this.tinyInt1isBit) { buf.append("CASE"); buf.append(" WHEN LOCATE('ZEROFILL', UPPER("); buf.append(fullMysqlTypeColumnName); buf.append(")) = 0 AND LOCATE('UNSIGNED', UPPER("); buf.append(fullMysqlTypeColumnName); buf.append(")) = 0 AND LOCATE('(1)', "); buf.append(fullMysqlTypeColumnName); buf.append(") != 0 THEN "); buf.append(this.transformedBitIsBoolean ? "16" : "-7"); buf.append(" ELSE -6 END "); } else { buf.append(mysqlType.getJdbcType()); } break; case YEAR: buf.append(this.yearIsDateType ? mysqlType.getJdbcType() : Types.SMALLINT); break; default: buf.append(mysqlType.getJdbcType()); } }
buf.append(" WHEN UPPER(DATA_TYPE)='POINT' THEN -2"); buf.append(" WHEN UPPER(DATA_TYPE)='LINESTRING' THEN -2"); buf.append(" WHEN UPPER(DATA_TYPE)='POLYGON' THEN -2"); buf.append(" WHEN UPPER(DATA_TYPE)='MULTIPOINT' THEN -2"); buf.append(" WHEN UPPER(DATA_TYPE)='MULTILINESTRING' THEN -2"); buf.append(" WHEN UPPER(DATA_TYPE)='MULTIPOLYGON' THEN -2"); buf.append(" WHEN UPPER(DATA_TYPE)='GEOMETRYCOLLECTION' THEN -2"); buf.append(" WHEN UPPER(DATA_TYPE)='GEOMCOLLECTION' THEN -2");
buf.append(" ELSE 1111"); buf.append(" END ");
}
}
然后再搞下调用该类的逻辑,定义一个 MyMetaUtil:
package com.df.test.config;
import cn.hutool.core.collection.ListUtil;import cn.hutool.core.util.StrUtil;import cn.hutool.db.DbRuntimeException;import cn.hutool.db.DbUtil;import cn.hutool.db.meta.*;import com.mysql.cj.jdbc.ConnectionImpl;import com.mysql.cj.jdbc.result.ResultSetFactory;
import javax.sql.DataSource;import java.sql.Connection;import java.sql.DatabaseMetaData;import java.sql.ResultSet;import java.sql.SQLException;import java.util.LinkedHashMap;import java.util.Map;
/** * @Author: lichong * @Description: * @Date: 2025/9/16 11:06 * @Version: 1.0 */public class MyMetaUtil {
public static String getCatalog(Connection conn) { if (null == conn) { return null; } try { return conn.getCatalog(); } catch (SQLException e) { // ignore }
return null; }
/** * 获取schema,获取失败返回{@code null} * * @param conn {@link Connection} 数据库连接,{@code null}时返回null * @return schema,获取失败返回{@code null} * @since 4.6.0 */ public static String getSchema(Connection conn) { if (null == conn) { return null; } try { return conn.getSchema(); } catch (SQLException e) { // ignore }
return null; }
public static Table getTableMeta(DataSource ds, String tableName) { final Table table = Table.create(tableName); Connection conn = null; try { conn = ds.getConnection();
// catalog和schema获取失败默认使用null代替 String catalog = getCatalog(conn); table.setCatalog(catalog); String schema = getSchema(conn); table.setSchema(schema);
ConnectionImpl connImpl = conn.unwrap(ConnectionImpl.class); ResultSetFactory unwrap = new ResultSetFactory(connImpl, null); MyDatabaseMetaData metaData = new MyDatabaseMetaData(connImpl.getMultiHostSafeProxy(), connImpl.getDatabase(),unwrap);
// 获得表元数据(表注释) /*try (final ResultSet rs = metaData.getTables(catalog, schema, tableName, new String[]{TableType.TABLE.value()})) { if (null != rs) { if (rs.next()) { table.setComment(rs.getString("REMARKS")); } } }
// 获得主键 try (final ResultSet rs = metaData.getPrimaryKeys(catalog, schema, tableName)) { if (null != rs) { while (rs.next()) { table.addPk(rs.getString("COLUMN_NAME")); } } }*/
// 获得列 try (final ResultSet rs = metaData.getColumns(catalog, schema, tableName, null)) { if (null != rs) { while (rs.next()) { table.setColumn(Column.create(table, rs)); } } }
// 获得索引信息(since 5.7.23) try (final ResultSet rs = metaData.getIndexInfo(catalog, schema, tableName, false, false)) { final Map<String, IndexInfo> indexInfoMap = new LinkedHashMap<>(); if (null != rs) { while (rs.next()) { //排除tableIndexStatistic类型索引 if (0 == rs.getShort("TYPE")) { continue; }
final String indexName = rs.getString("INDEX_NAME"); final String key = StrUtil.join("&", tableName, indexName); // 联合索引情况下一个索引会有多个列,此处须组合索引列到一个索引信息对象下 IndexInfo indexInfo = indexInfoMap.get(key); if (null == indexInfo) { indexInfo = new IndexInfo(rs.getBoolean("NON_UNIQUE"), indexName, tableName, schema, catalog); indexInfoMap.put(key, indexInfo); } indexInfo.getColumnIndexInfoList().add(ColumnIndexInfo.create(rs)); } } table.setIndexInfoList(ListUtil.toList(indexInfoMap.values())); } } catch (SQLException e) { throw new DbRuntimeException("Get columns error!", e); } finally { DbUtil.close(conn); }
return table; }}
嗯嗯,搞定,测试。速度上来了。
五、总结
tidb 对该表查询慢是毋庸置疑的,可能是整体机制上限制的问题,查看之前的版本升级中也优化过该表的查询,但是数据量大了的确会造成该影响。希望后续官方能给出一个好的解决方案。
第二,这个 mysql 驱动包处理下划线表名为 like 自己感觉也是有问题的,因为业务中很多都是以下划线命名表名和库名,这种处理逻辑的确是个 bug,上述的总结可以提高单表的查询速度,如果进行多表扫描,和查询整个库的字段信息,还是不会提高的。
版权声明: 本文为 InfoQ 作者【TiDB 社区干货传送门】的原创文章。
原文链接:【http://xie.infoq.cn/article/a13f534fce7cb3909d5d5af8b】。文章转载请联系作者。
TiDB 社区干货传送门
TiDB 社区官网:https://tidb.net/ 2021-12-15 加入
TiDB 社区干货传送门是由 TiDB 社区中布道师组委会自发组织的 TiDB 社区优质内容对外宣布的栏目,旨在加深 TiDBer 之间的交流和学习。一起构建有爱、互助、共创共建的 TiDB 社区 https://tidb.net/







评论