By now many of you have heard an earful about how using bind variables enhances performance; for those who haven't, let me try to explain the core concepts in as simple manner as I can. (I also recommend a visit to Tom Kyte's asktom.oracle.com, where you can learn how singularly important it is to use bind variables to improve the performance of SQL statements as well as how to use them in several languages.)
Assume you have a table
called CUSTOMERS that has, among others, a column called
STATE_CODE, which store the customer's residence state in the
two-letter abbreviation of the U.S. states—CT, NY and so on. When
you want to find out how many customers have purchased more than
three times and are from the state of Connecticut ('CT'), you will
most likely issue:
select count(1)
from customers
where state_code = 'CT'
and times_purchased > 3;
When you issue this query, Oracle has to perform an activity called parsing, which will generate an execution plan for the SQL statement you just issued. After parsing the query is ready fro execution. Parsing is similar in concept to compilation of code in software; when you write something in C++, say, you can't run that in the operating system—first you must compile it and make it an executable. The activity of parsing makes an executable from the SQL statement.
Now suppose another user issues a statement as shown below:
select count(1) from customers where state_code = 'NY' and times_purchased > 3;
This is almost identical to the query above, with one exception: the state_code searched is NY instead of CT. Ideally, the parsed code would be the same and the value of the literal would be supplied at runtime. But the way the queries are written, Oracle interprets them as different and has to do another parse for the second query.
Consider instead if the queries were written as:
select count(1) from customers where state_code = <StateCode> and times_purchased > 3;
In this example, the <StateCode> is conceptually known as a bind variable, which is a place holder for values to be passed during execution. Bind variables are represented in the form of :VariableName, as shown below:
where state_code = :state_code
If your code does not have bind variables and instead littered with references to literal values such as where state_code = 'CT', you can force all literals to be converted to bind variables by specifying an initialization parameter:
cursor_sharing = force
This parameter will cause the statement where state_code = 'CT' to be rewritten as where state_code = ":SYS_0001" where SYS_0001 is a system generated variable name. This approach will make these statements identical.
Well, if bind variables are so great, why not use them all the time? Don't we have a magic bullet—cursor_sharing—which transforms all the bad code to sharable statements? (Those already familiar with the reasons, especially the concept of bind-peeking, can skip to the section titled "Adaptive Cursors".)
Consider the case where there is an index on the column STATE_CODE. The values in the column are shown below:
select state_code, count(1) from customers group by state_code; ST COUNT(1) -- ---------- NY 994901 CT 5099
As you can see, the data is highly skewed; about 5% of the rows have 'CT' in them while the rest have 'NY'. It's not surprising considering the population of the states. Now, let's see what type of execution plan is generated for the query shown earlier:
SQL> set autot traceonly explain SQL> select * from customers where state_code = 'NY' and times_purchased > 3 2 / Execution Plan ---------------------------------------------------------- Plan hash value: 2008213504 ------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 895K| 26M| 1532 (9)| 00:00:19 | |* 1 | TABLE ACCESS FULL| CUSTOMERS | 895K| 26M| 1532 (9)| 00:00:19 | ------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("TIMES_PURCHASED">3 AND "STATE_CODE"='NY')
The query used a full table scan—the appropriate action since 95% of the rows are returned with the query and an index scan would have been very expensive. Now issue the same query with 'CT':
SQL> c/NY/CT 1* select * from customers where state_code = 'CT' and times_purchased > 3 SQL> / Execution Plan ---------------------------------------------------------- Plan hash value: 4876992 --------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 4589 | 138K| 56 (2)| 00:00:01 | |* 1 | TABLE ACCESS BY INDEX ROWID| CUSTOMERS | 4589 | 138K| 56 (2)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IN_CUST_STATE | 5099 | | 12 (0)| 00:00:01 | --------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("TIMES_PURCHASED">3) 2 - access("STATE_CODE"='CT')
It used the index. Again, that was appropriate; CT accounts for only 5% of the rows and an index scan will be beneficial.
Let's see the behavior when using a bind variable. Here is the demonstrated behavior in Oracle Database 10g.
SQL> var state_code varchar2(2) SQL> exec :state_code := 'CT' PL/SQL procedure successfully completed. SQL> select max(times_purchased) from customers where state_code = :state_code 2 / Execution Plan ---------------------------------------------------------- Plan hash value: 296924608 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 6 | 1511 (8)| 00:00:19 | | 1 | SORT AGGREGATE | | 1 | 6 | | | |* 2 | TABLE ACCESS FULL| CUSTOMERS | 500K| 2929K| 1511 (8)| 00:00:19 | -------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("STATE_CODE"=:STATE_CODE)
The optimizer chose Full Table Scan on CUSTOMERS table. Shouldn't the index be used because we are searching for CT only, which accounts for a mere 5% of the total records? What made the optimizer choose full table scan over index scan?
The answer is a phenomenon called bind peeking. Earlier, when you ran that query with the bind variable value set to 'NY', the optimizer had to do a hard parse for the first time and while doing so it peeked at the bind variable to see what value had been assigned to it. The value was 'NY'. Since 'NY' accounts for about 95% of the rows, the optimizer chose full table scan (as expected). In addition, it also froze the plan for the query. Next, when we issued the same query, but for 'CT', the plan was not re-calculated and the optimizer used the same plan used earlier, even though it was not the best one for the purpose. Had you used a value such as 'CT' instead of the bind variable in the query, the optimizer would have picked the correct plan.
Thus as you can see, bind
variables, even though they were good in most cases, actually
failed in cases where the selectivity of the values radically
affected the plans, as in this example where the selectivity of
the values 'CT' and 'NY' were 5% and 95% respectively. In cases
where the distribution of data is such that the selectivity is
almost the same for all values, the execution plan would remain
the same. Therefore smart SQL coders will choose when to break the
cardinal rule of using bind variables, employing literals instead.