Forcing Sequential Scans on PostgreSQL Using Large Integers (2022-11-01)

Overview

Recently, when adding support to Sequel for automatically validating values of integer columns based on the underlying column type, I came across some unfortunate behavior in PostgreSQL. The unfortunate behavior comes from a combination of two features:

So if you have an integer or bigint column in your table named id, with a regular index, and you issue an SQL query such as SELECT * FROM table_name WHERE id = 9223372036854775808, PostgreSQL will not use any index on the table, because the index is on the value of the column, not the value of the column after it has been casted to numeric.

Example of Issue

Here's an example of the behavior. We first create a table with 1,000,000 rows, then create an index on it:

CREATE TABLE a AS SELECT * FROM generate_series(1,1000000) AS a(id);
CREATE INDEX ON a(id);

We can then check the query plan when using an integer inside the bigint range:

EXPLAIN ANALYZE SELECT id FROM a WHERE id = 9223372036854775807;
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Index Only Scan using a_id_idx on a  (cost=0.42..4.44 rows=1 width=4) (actual time=0.088..0.089 rows=0 loops=1)
   Index Cond: (id = '9223372036854775807'::bigint)
   Heap Fetches: 0
 Planning Time: 0.327 ms
 Execution Time: 0.117 ms

This is as expected, uses the index, and executes in well under a millisecond.

Next we can check the query plan when using an integer outside the bigint range:

EXPLAIN ANALYZE SELECT id FROM a WHERE id = 9223372036854775808;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..12175.00 rows=5000 width=4) (actual time=169.630..177.069 rows=0 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Seq Scan on a  (cost=0.00..10675.00 rows=2083 width=4) (actual time=158.821..158.821 rows=0 loops=3)
        Filter: ((id)::numeric = '9223372036854775808'::numeric)
        Rows Removed by Filter: 333333
 Planning Time: 0.095 ms
 Execution Time: 177.091 ms

This is the unforunate behavior, showing a sequential scan with performance over 1000 times worse.

For large tables, forcing a sequential scan is very negative in terms of performance. If you have an application that will accept a user-supplied integer outside the bigint range, and will use the value directly in an SQL query on a large table, this is a potential denial of service vector.

Mitigations

There are a few ways to avoid this issue. One way is using bound variables. Another is raising an exception instead of running the query if the integer is outside the bigint range.

A third way is explicitly casting the value from numeric to integer or bigint depending on the integer value or expected type. You cannot always cast to bigint, as that will break code:


SELECT '[1]'::jsonb - 1;
 ?column?
----------
 [1]
(1 row)

SELECT '[1]'::jsonb - 1::integer;
 ?column?
----------
 [1]
(1 row)

SELECT '[1]'::jsonb - 1::bigint;
ERROR:  operator does not exist: jsonb - bigint
LINE 1: SELECT '[1]'::jsonb - 1::bigint;
                            ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

Explicit casting based on the integer value is a fairly pointless mitigation, though, because you have to check the size of the value, and if the value is outside the bigint range, you might as well raise an exception first, instead of sending a query that you know the database return an error for. Explicit casting based on the expected type should be fine.

A fourth way is quoting the value, as if it were a string (e.g. SELECT * FROM t WHERE id = '9223372036854775808'). PostgreSQL treats the single quoted values as the unknown type, which get implicitly casted to the type of whatever it is compared against. If the value provided is outside of the range of that type, PostgreSQL will return an error. However, quoting integers can break code that depends on the implicit type:

SELECT 1 + 1;
 ?column?
----------
        2
(1 row)

SELECT '1' + '1';
ERROR:  operator is not unique: unknown + unknown
LINE 1: SELECT '1' + '1';
                   ^
HINT:  Could not choose a best candidate operator. You might need to add explicit type casts.

Worse, it can silently change the behavior of SQL queries:

SELECT json_build_array(1);
 json_build_array
------------------
 [1]
(1 row)

SELECT json_build_array('1');
 json_build_array
------------------
 ["1"]
(1 row)

For these reasons, I don't think it makes sense to explicitly cast based on the value or quote the values. Unless you are using bound variables, you should either explicitly check the values of integers before using the integers in SQL queries, or explicit cast to the expected type.

A fifth way would be creating an index on the integer column casted to numeric:

CREATE INDEX ON a((id::numeric));

That's more of a workaround than a fix, and should only be used if you access to the database but no ability to fix the underlying code or SQL queries.

Not a PostgreSQL Bug

I reached out to the PostgreSQL developers regarding this issue and they informed me this is not a bug. They said this should be handled on the client side, best done using bound variables.

Are You Affected?

If you want to know whether you are affected by this issue, ask yourself the following four questions:

If you answered yes to all four questions, you are probably vulnerable. If your programming language uses fixed sized integer types, you are probably not vulnerable, unless you are using an unsigned 64-bit type for an integer value that you'll be literalizing into an SQL query. If your application or database library is validating the integer values before using them, to make sure they are within the range of a signed 64-bit integer type, then you are probably not vulnerable. If you are quoting the integers, explicitly casting, or using bound variables, you are probably not vulnerable.

Behavior of Other Numeric Type Comparisons on PostgreSQL

Because PostgreSQL treats literal numbers with decimal points as numeric values, you can also force a sequential scan by using a number with a decimal point:

EXPLAIN ANALYZE SELECT id FROM a WHERE id = 1.0;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..12175.00 rows=5000 width=4) (actual time=0.239..169.829 rows=1 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on a  (cost=0.00..10675.00 rows=2083 width=4) (actual time=98.581..152.694 rows=0 loops=3)
         Filter: ((id)::numeric = 1.0)
         Rows Removed by Filter: 333333
 Planning Time: 0.105 ms
 Execution Time: 169.858 ms

When performing a integer = double precision comparison, PostgreSQL will cast the integer to double precision. So this issue also affects queries that use double precision or real values.

When performing a double precision = integer or double precision = numeric comparison, PostgreSQL will cast the integer or numeric to double precision instead of the other way around. So you cannot force a sequential scan for real or double precision columns.

However, you can force a sequential scan for numeric columns, if the value is double precision or real:

CREATE TABLE ns (id numeric);
INSERT INTO ns SELECT id FROM a;
CREATE INDEX ON ns(id);
EXPLAIN ANALYZE SELECT id FROM ns WHERE id = 1.0::double precision;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..12175.00 rows=5000 width=6) (actual time=225.169..233.160 rows=0 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on ns  (cost=0.00..10675.00 rows=2083 width=6) (actual time=215.148..215.149 rows=0 loops=3)
         Filter: ((id)::double precision = '1'::double precision)
         Rows Removed by Filter: 333333
 Planning Time: 0.125 ms
 Execution Time: 233.183 ms

This isn't likely to be an issue in real code, since it can only be hit when you are manually casting the input to double precision. For safety, if you have to cast a floating point number, cast to numeric unless you are sure the underlying type uses double precision. Casting to numeric is safer because it will not cause a sequential scan for either numeric or double precision columns.

Queries similar to those in this section can also be a denial of service vector. If you have an integer column, you should not compare it to a user-provided floating point value. You can run into this issue even when using bound variables, if you are casting the values based on a user-provided type. You should always try to typecast any user-provided input to the appropriate type, you shouldn't trust the type of user input.

The difference between the queries in this section and the integer = numeric query given earlier is that the queries in this section will always result in a sequential scan for the same types in the programming language, regardless of the size of the value. With the integer = numeric query given earlier, a sequential scan can occur for different values of the same type in your programming language, because PostgreSQL implicitly treats the integer values outside the bigint range as numeric. Because the switch from an index scan to a sequential scan occurs for a different value of the same type (from a programming language perspective), I consider that a more serious issue.

Behavior of Other Databases

To determine whether PostgreSQL's behavior is reasonable, we can consider how other databases handle the same issue. I tested all 11 SQL databases that Sequel supports. To perform this testing, I used the following Sequel code, which inserts 2**20 (~1,000,000) records, then times queries for values at the boundaries of signed and unsigned 32-bit and 64-bit types:

DB.create_table(:large_table_test){Integer :id}
at_exit{DB.drop_table(:large_table_test)}
ds = DB[:large_table_test]
ds.insert(1)
i = 1
20.times do 
  ds.insert([:id], ds.select{id+i})
  i *= 2
end
DB.add_index(:large_table_test, :id)
puts "#{ds.count} records:"

require 'benchmark'
[0, 2**31-1, 2**31, 2**32-1, 2**32, 2**63-1, 2**63, 2**64-1, 2**64].each do |i|
  puts "WHERE (id = #{i}): #{Benchmark.measure{ds.first(id: Sequel.lit((i).to_s))}}"
end

Here are the results:

Only PostgreSQL had a significant difference in performance, showing much worse performance for 2**63, 2**64-1, and 2**64 than for the lower values.

Potential Changes to PostgreSQL

The behavior of treating integers outside bigint range as numeric has always been documented by PostgreSQL, so that definitely cannot be considered a bug. I think that behavior is reasonable, because the bigint type cannot support such integers. The only alternative would be raising an error, and that change is impossible due to backwards compatibility.

The actual problematic behavior is when PostgreSQL does an integer = numeric or integer = double precision comparison, it casts the integer to numeric or double precision as the first step. This the behavior that results the sequential scan, since it prevents the use of an index on the column. I think it would be better to first check if the numeric or double precision value can be represented as a value of the integer type it is compared against. If the numeric or double precision value is outside the range of that integer type or has a fractional value, then you could treat the equality comparison as false (modulo NULL handling). However, I have no knowledge of PostgreSQL internals, and therefore no idea whether such an approach is feasible.

Changes to Sequel

Applications using Sequel could be vulnerable to this issue, if they were not checking the sizes of the integer values, since it used large integer values directly in queries. The first change I made was to have Sequel's PostgreSQL adapter raise an exception if attempting to literalize an integer outside the bigint range. For users that really want the previous behavior, I added an extension that allowed them to select the old behavior, or to quote the large integer values.

This issue taught me a valuable lesson. For many years, I've thought using values directly in SQL queries is fine, as long as you correctly escape them. Turns out that user provided integers are not safe on PostgreSQL. If I was using bound variables, I wouldn't have needed to worry about this issue. So to prevent this type of issue in the future, I ressurected Sequel's pg_auto_parameterize extension, which uses bound variables automatically. This extension was originally added in Sequel 3.34.0, and then removed in Sequel 4.0.0 because it had many corner cases. I was able to fix almost all of the corner cases, and for the last two weeks have been running all of my applications using the pg_auto_parameterize extension. I recommend that all Sequel users using the PostgreSQL adapter try out the pg_auto_parameterize extension and see if it works for them. As mentioned above, automatically using bound variables does not prevent the issue if you are supporting arbitrary types submitted by users (e.g. users submitting float values for integer columns), so you should make sure you are validating that submitted types are expected (and/or converting submitted types to expected types).

Updates (last updated: 2022-11-04)

This post resulted in a small but informative discussion on Lobsters. A few things I learned from the replies:

This post was featured in Episode 42 of the 5mins of Postgres video series.

Other Software Affected (last updated: 2023-01-17)

I found that this issue affected other libraries besides Sequel. Here is a list of other libraries that were or are affected:

I think it is highly likely that other database libraries are affected, but these are the libraries I found vulnerable in the brief time I spent testing other libraries.