Recently, when adding support to Sequel for automatically validating values of integer columns based on the underlying column type, I came across some unfortunate behavior in PostgreSQL. The unfortunate behavior comes from a combination of two features:
numeric
(since it is too large for bigint
). integer
or bigint
value and a numeric
value, PostgreSQL will implicitly cast the integer
or bigint
value to numeric
, then do the comparison.So if you have an integer
or bigint
column in your table named id
, with a regular index, and you issue an SQL query such as SELECT * FROM table_name WHERE id = 9223372036854775808
, PostgreSQL will not use any index on the table, because the index is on the value of the column, not the value of the column after it has been casted to numeric
.
Here's an example of the behavior. We first create a table with 1,000,000 rows, then create an index on it:
CREATE TABLE a AS SELECT * FROM generate_series(1,1000000) AS a(id);
CREATE INDEX ON a(id);
We can then check the query plan when using an integer inside the bigint
range:
EXPLAIN ANALYZE SELECT id FROM a WHERE id = 9223372036854775807;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Index Only Scan using a_id_idx on a (cost=0.42..4.44 rows=1 width=4) (actual time=0.088..0.089 rows=0 loops=1)
Index Cond: (id = '9223372036854775807'::bigint)
Heap Fetches: 0
Planning Time: 0.327 ms
Execution Time: 0.117 ms
This is as expected, uses the index, and executes in well under a millisecond.
Next we can check the query plan when using an integer outside the bigint
range:
EXPLAIN ANALYZE SELECT id FROM a WHERE id = 9223372036854775808;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..12175.00 rows=5000 width=4) (actual time=169.630..177.069 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on a (cost=0.00..10675.00 rows=2083 width=4) (actual time=158.821..158.821 rows=0 loops=3)
Filter: ((id)::numeric = '9223372036854775808'::numeric)
Rows Removed by Filter: 333333
Planning Time: 0.095 ms
Execution Time: 177.091 ms
This is the unforunate behavior, showing a sequential scan with performance over 1000 times worse.
For large tables, forcing a sequential scan is very negative in terms of performance. If you have an application that will accept a user-supplied integer outside the bigint
range, and will use the value directly in an SQL query on a large table, this is a potential denial of service vector.
There are a few ways to avoid this issue. One way is using bound variables. Another is raising an exception instead of running the query if the integer is outside the bigint
range.
A third way is explicitly casting the value from numeric
to integer
or bigint
depending on the integer value or expected type. You cannot always cast to bigint
, as that will break code:
SELECT '[1]'::jsonb - 1;
?column?
----------
[1]
(1 row)
SELECT '[1]'::jsonb - 1::integer;
?column?
----------
[1]
(1 row)
SELECT '[1]'::jsonb - 1::bigint;
ERROR: operator does not exist: jsonb - bigint
LINE 1: SELECT '[1]'::jsonb - 1::bigint;
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
Explicit casting based on the integer value is a fairly pointless mitigation, though, because you have to check the size of the value, and if the value is outside the bigint
range, you might as well raise an exception first, instead of sending a query that you know the database return an error for. Explicit casting based on the expected type should be fine.
A fourth way is quoting the value, as if it were a string (e.g. SELECT * FROM t WHERE id = '9223372036854775808'
). PostgreSQL treats the single quoted values as the unknown type, which get implicitly casted to the type of whatever it is compared against. If the value provided is outside of the range of that type, PostgreSQL will return an error. However, quoting integers can break code that depends on the implicit type:
SELECT 1 + 1;
?column?
----------
2
(1 row)
SELECT '1' + '1';
ERROR: operator is not unique: unknown + unknown
LINE 1: SELECT '1' + '1';
^
HINT: Could not choose a best candidate operator. You might need to add explicit type casts.
Worse, it can silently change the behavior of SQL queries:
SELECT json_build_array(1);
json_build_array
------------------
[1]
(1 row)
SELECT json_build_array('1');
json_build_array
------------------
["1"]
(1 row)
For these reasons, I don't think it makes sense to explicitly cast based on the value or quote the values. Unless you are using bound variables, you should either explicitly check the values of integers before using the integers in SQL queries, or explicit cast to the expected type.
A fifth way would be creating an index on the integer column casted to numeric
:
CREATE INDEX ON a((id::numeric));
That's more of a workaround than a fix, and should only be used if you access to the database but no ability to fix the underlying code or SQL queries.
I reached out to the PostgreSQL developers regarding this issue and they informed me this is not a bug. They said this should be handled on the client side, best done using bound variables.
If you want to know whether you are affected by this issue, ask yourself the following four questions:
bigint
range (most dynamically typed programming languages will)?bigint
range?If you answered yes to all four questions, you are probably vulnerable. If your programming language uses fixed sized integer types, you are probably not vulnerable, unless you are using an unsigned 64-bit type for an integer value that you'll be literalizing into an SQL query. If your application or database library is validating the integer values before using them, to make sure they are within the range of a signed 64-bit integer type, then you are probably not vulnerable. If you are quoting the integers, explicitly casting, or using bound variables, you are probably not vulnerable.
Because PostgreSQL treats literal numbers with decimal points as numeric
values, you can also force a sequential scan by using a number with a decimal point:
EXPLAIN ANALYZE SELECT id FROM a WHERE id = 1.0;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..12175.00 rows=5000 width=4) (actual time=0.239..169.829 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on a (cost=0.00..10675.00 rows=2083 width=4) (actual time=98.581..152.694 rows=0 loops=3)
Filter: ((id)::numeric = 1.0)
Rows Removed by Filter: 333333
Planning Time: 0.105 ms
Execution Time: 169.858 ms
When performing a integer = double precision
comparison, PostgreSQL will cast the integer
to double precision
. So this issue also affects queries that use double precision
or real
values.
When performing a double precision = integer
or double precision = numeric
comparison, PostgreSQL will cast the integer
or numeric
to double precision
instead of the other way around. So you cannot force a sequential scan for real
or double precision
columns.
However, you can force a sequential scan for numeric
columns, if the value is double precision
or real
:
CREATE TABLE ns (id numeric);
INSERT INTO ns SELECT id FROM a;
CREATE INDEX ON ns(id);
EXPLAIN ANALYZE SELECT id FROM ns WHERE id = 1.0::double precision;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..12175.00 rows=5000 width=6) (actual time=225.169..233.160 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on ns (cost=0.00..10675.00 rows=2083 width=6) (actual time=215.148..215.149 rows=0 loops=3)
Filter: ((id)::double precision = '1'::double precision)
Rows Removed by Filter: 333333
Planning Time: 0.125 ms
Execution Time: 233.183 ms
This isn't likely to be an issue in real code, since it can only be hit when you are manually casting the input to double precision
. For safety, if you have to cast a floating point number, cast to numeric
unless you are sure the underlying type uses double precision
. Casting to numeric
is safer because it will not cause a sequential scan for either numeric
or double precision
columns.
Queries similar to those in this section can also be a denial of service vector. If you have an integer
column, you should not compare it to a user-provided floating point value. You can run into this issue even when using bound variables, if you are casting the values based on a user-provided type. You should always try to typecast any user-provided input to the appropriate type, you shouldn't trust the type of user input.
The difference between the queries in this section and the integer = numeric
query given earlier is that the queries in this section will always result in a sequential scan for the same types in the programming language, regardless of the size of the value. With the integer = numeric
query given earlier, a sequential scan can occur for different values of the same type in your programming language, because PostgreSQL implicitly treats the integer values outside the bigint
range as numeric
. Because the switch from an index scan to a sequential scan occurs for a different value of the same type (from a programming language perspective), I consider that a more serious issue.
To determine whether PostgreSQL's behavior is reasonable, we can consider how other databases handle the same issue. I tested all 11 SQL databases that Sequel supports. To perform this testing, I used the following Sequel code, which inserts 2**20 (~1,000,000) records, then times queries for values at the boundaries of signed and unsigned 32-bit and 64-bit types:
DB.create_table(:large_table_test){Integer :id}
at_exit{DB.drop_table(:large_table_test)}
ds = DB[:large_table_test]
ds.insert(1)
i = 1
20.times do
ds.insert([:id], ds.select{id+i})
i *= 2
end
DB.add_index(:large_table_test, :id)
puts "#{ds.count} records:"
require 'benchmark'
[0, 2**31-1, 2**31, 2**32-1, 2**32, 2**63-1, 2**63, 2**64-1, 2**64].each do |i|
puts "WHERE (id = #{i}): #{Benchmark.measure{ds.first(id: Sequel.lit((i).to_s))}}"
end
Here are the results:
Only PostgreSQL had a significant difference in performance, showing much worse performance for 2**63
, 2**64-1
, and 2**64
than for the lower values.
The behavior of treating integers outside bigint
range as numeric
has always been documented by PostgreSQL, so that definitely cannot be considered a bug. I think that behavior is reasonable, because the bigint
type cannot support such integers. The only alternative would be raising an error, and that change is impossible due to backwards compatibility.
The actual problematic behavior is when PostgreSQL does an integer = numeric
or integer = double precision
comparison, it casts the integer
to numeric
or double precision
as the first step. This the behavior that results the sequential scan, since it prevents the use of an index on the column. I think it would be better to first check if the numeric
or double precision
value can be represented as a value of the integer type it is compared against. If the numeric
or double precision
value is outside the range of that integer type or has a fractional value, then you could treat the equality comparison as false (modulo NULL handling). However, I have no knowledge of PostgreSQL internals, and therefore no idea whether such an approach is feasible.
Applications using Sequel could be vulnerable to this issue, if they were not checking the sizes of the integer values, since it used large integer values directly in queries. The first change I made was to have Sequel's PostgreSQL adapter raise an exception if attempting to literalize an integer outside the bigint
range. For users that really want the previous behavior, I added an extension that allowed them to select the old behavior, or to quote the large integer values.
This issue taught me a valuable lesson. For many years, I've thought using values directly in SQL queries is fine, as long as you correctly escape them. Turns out that user provided integers are not safe on PostgreSQL. If I was using bound variables, I wouldn't have needed to worry about this issue. So to prevent this type of issue in the future, I ressurected Sequel's pg_auto_parameterize
extension, which uses bound variables automatically. This extension was originally added in Sequel 3.34.0, and then removed in Sequel 4.0.0 because it had many corner cases. I was able to fix almost all of the corner cases, and for the last two weeks have been running all of my applications using the pg_auto_parameterize
extension. I recommend that all Sequel users using the PostgreSQL adapter try out the pg_auto_parameterize
extension and see if it works for them. As mentioned above, automatically using bound variables does not prevent the issue if you are supporting arbitrary types submitted by users (e.g. users submitting float values for integer
columns), so you should make sure you are validating that submitted types are expected (and/or converting submitted types to expected types).
This post resulted in a small but informative discussion on Lobsters. A few things I learned from the replies:
setObject
method instead of the setLong
method for integers.integer = numeric
comparison is true for programming languages like Ruby, which have separate integer and decimal/floating point types. However, it's probably not true for programming languages that only have a single numeric type that works for both integers and decimal/floating point values, such as JavaScript and Lua before 5.3. So if you are using a programming language that only has a single numeric type for integers and decimal/floating point values, you need to be very careful in all of your input handling, ensuring the only integer values are used for comparison to PostgreSQL integer columns, or using explicit casts to the expected integer type for all queries involving comparisons to integer columns.This post was featured in Episode 42 of the 5mins of Postgres video series.
I found that this issue affected other libraries besides Sequel. Here is a list of other libraries that were or are affected:
I think it is highly likely that other database libraries are affected, but these are the libraries I found vulnerable in the brief time I spent testing other libraries.