Welcome to the Parallel Query Dimension of Oracle
Paper #200
Ian Abramson
Ian Abramson Systems Inc
Toronto, ON, Canada
Parallelisms, symmetry, equivalence—these words and concepts all have something in common. They demonstrate how the total exceeds the sum of its parts. Goalies may be able to stop every shot aimed at them, but without the rest of the team, a goalie is powerless to win without a great deal of luck. In the world of computers, this concept is applicable in terms of parallelism. Parallel processing allows great computer with a whole team of CPUs, to work together to create an even greater computational power. Parallel processing has existed for many years, but with Oracle8 the power of multiple processors and servers takes advantage of a computer topology that can further empower the database to increase application and server productivity.
The question is: What do I need to be able to take advantage of these parallelism features incorporated within Oracle8? First you need a computer, but will any computer do? Possibly, as long as the computer can support multiple CPUs and you have bought, configured and installed these CPUs, you have the opportunity to enter into the parallel query world. In today’s world this means that you would require a machine and operating system that supports two or more processors, and an operating system that can address and share processing amongst these processors. These machines share memory and disks, but spread their processing power amongst the numerous CPUs. The reason that parallel process provides us with so much of a performance boost is that we often run into processing bottlenecks, not I/O bottlenecks. By implementing solutions that use the Parallel Query feature, you will start to experience new problems. You may then find that I/O is a bottleneck, since the CPUs will now be hungry for more information at a rate that the system had not experienced during serial operation execution. In this presentation we will discuss how best to use the parallel query feature, how best to implement it, and what to do to ensure that you are using this feature at an optimal level.
What is the
Parallel Query Option.
The Parallel Query Option is a mechanism that allows a large
query to be split up (transparently to the end-user) into a number of
smaller
queries that can all be run simultaneously. The
Parallel Query Option (PQO) should not be confused with
the Oracle Parallel Server (OPS) which allows multiple instances of
Oracle
(usually running on a number of separate machines) to address a single
set of
database files and uses a high-speed ‘distributed lock manager’ to
synchronize
the ownership of blocks and locks across the separate SGA’s
Init.ora
Parameters
To use PQO, you need to have a number of ‘floating’
processes available in the system that can be called by your normal
shadow
process (dedicated server). These processes, called the ‘parallel query slaves’
will have names like ora_p000_{SID}, and you can have up to 256 of them.
The parameter parallel_min_servers specifies the number of such processes that you want permanently running and waiting for calls. Since such processes take up machine resources you may want to keep this number low for normal running: if you set parallel_max_servers Oracle will then spawn further processes on demand up to this limit. One odd feature of dynamically spawned processes, however, is that they are owned at the O/S level by the user running the session’s shadow process, and not by any special ‘Oracle id’.
Once you have the capability of running parallel queries, there are two other parameters that you should consider. These relate to the way the system performance will degrade as the load increases.
Consider the problem of a query that takes 3 minutes to complete when running with a degree of parallelism of 15. What happens if the database is so busy that no parallel slaves are free when someone starts the query. Is it acceptable for the query to run serially and take 45 minutes, or should it simply die with a warning message?
Set parallel_min_percent to a number between 0 and 100, and if Oracle is unable to get that percentage of the demanded degree of parallelism it returns an error (ORA-12827 insufficient parallel query slaves available). For example, you want a tablescan at degree 6. You have parallel_min_percent set to 50. If the query can run at least at degree 3 with the current number of slaves available it will run, but it there are insufficient slaves available for running at degree 3 the error is raised.
Associated with this parameter is optimizer_percent_parallel which affects the assumptions that the optimizer makes about the degree of parallelism at which a query is expected to run.
Imagine a query that works best at a high degree
of
parallelism (20 say) by using hash-joins, but gives an acceptable
performance
at a lower level of parallelism (5 to 15 say) by switching to nested
loops. If the optimizer decides that the
default degree of the query is 20, then it will always generate the
hashing
path, even when the system is busy and there are not enough free slaves
around
to run the query at degree 20. Setting
this parameter to a number between 0 and 100, however, tells the
optimizer to
reduce the assumed degree of parallelism - setting it to 50 in our
example
would make the optimizer choose the path that was appropriate to degree
10,
hence taking the nested loop path.
Invoking PQO
After setting your parameters and restarting your database,
what makes it happen?
The first possibility is to give a table an explicit (or
implicit) degree of parallelism:
alter table PX parallel (degree 4);
alter table PX parallel (degree default);
Whenever a query involves this table, the
optimizer will try
to find a path that uses parallel techniques on the table.
Two points that are specific to 7.3.2:
First, the Cost Based Optimizer will always take over in queries where there are parallel tables available even if the current optimizer goal is Rule. Second, the default keyword translates into ‘the smaller of the number of CPUs and the number of devices across which the table appears to be spread’.
The second possibility for introducing parallelism
into a
query is to put an explicit hint into it:
select
/*+ parallel (px, 4)*/
count(*) from PX;
select /*+ parallel (px, default) */
count(*) from PX;
The third option is to define views that contain
hints, so
that application code does not have to worry about degrees of
parallelism,
create
or replace view big_px as
select /*+ parallel (px1, 4) parallel
(px2, 6) */
from px1, px2
where px1.id = px2.x_id;
When to use it:
Typical queries may need to retrieve and crunch a large
number of rows to return a small result set.
The Parallel Query Option can be very effective in reducing the
time for
such queries - on the plus side you have the options of increasing
parallel
throughput on disks and breaking up the sort requirements to avoid
sorts to
disk - on the minus side you need to consider the risk of unsuitable
data
distributions that result in excessive communication and sort costs.
Small Parent Scan to Large Child:
The obvious use of this option is with the recently
introduced HASH JOIN facility where two tables are scanned and the
smaller is
hashed to be used as a target for the larger and most of the benefit
comes from
the ease of separating the I/O on tablescanning. However
another option (particularly of use
in conjunction with partition views) is in scanning a small table and
using
indexed accesses into a large table where the physical data
distribution is
suitably packed.
Object Creation:
When running in parallel, particularly with the
unrecoverable option, you may find that the time to create an object
(table or
index) is significantly reduced. But
there are a couple of side-effects (see further down) to worry about
when
chasing this benefit.
Parallel Query Processing
The processing of information using multiple
processors requires the Oracle8 Parallel Query feature. Parallel Query
allows SQL statements to utilize share processing across these
processors simultaneously. When processing SQL on a machine that has
only a single processor, all SQL will execute within a single process.
Parallel Query allows statements to be divided and utilize multiple
processes, resulting in quicker completion of the statement.
This discussion is limited to the Parallel Query and it does not include the use and configuration of the Parallel Server Option. The Parallel Server Option relates to the linking of multiple database servers together, whereas the Parallel Query option can be implemented on a single server that contains multiple processors.
Parallel Query provides significant performance gains for databases that contain large amounts of data. The types of systems that gain the most from parallel operations include:
Generally, the parallel query feature is most useful when the queries require a great deal of time to complete, and when a large number of rows are processed. These database applications share a number of common characteristics. The most common characteristic is size. These applications are information intensive. The amount of storage required is very large, to maximize the retrieval of the information the parallel query feature spreads the SQL among the active CPUs. This load sharing is performed by efficiently splitting up a request over the many processors running on the system. As noted previously, by working as a team the goal of completing an operation can be reached sooner and require a smaller amount of effort.
At execution time Oracle, along with the server’s multiple processors, work together to distribute the database operation statement. Interestingly enough, the splitting of work by the parallel query engine is dynamic; if there is any changes to the server’s configuration, it will be adapted to by Oracle at the time of the statement request.
It is important to note that if your system is already heavily loaded that the gains expected by parallel query will be reduced. Ensure that your server has the available cycles for implementing a large information intensive database. Also verify that you have optimized your current CPU usage, disks and disk controllers before embarking on a parallelism initiative.
The SQL statements and database functions that benefit from parallelization with parallel query are the following:
The above statements will take advantage of parallel query through the proper use of these statements. Together with the configuration of the database, it will all provide results in a more timely manner.
Oracle will parallelize operations in the following ways:
The database parallelizes the SQL statement at execution time. At this time it divides the table or index into ranges of database blocks and then executes on these ranges in parallel. This processing on the ranges are performed by ROWID and each statement will access the information in parallel for a high and low range of ROWIDs. In the case of partitioned data, the information is not accessed by this range, but by the defined partitions. The information is queried by a set of ROWIDs from within each partition and no scan can overlap two partitions.
Partitions are excellent candidates for parallel execution as these data sets can be easily divided into smaller working groups to allow for efficient information interrogation. Basically each partition becomes a candidate for assignment to an individual parallel query server process. In some cases, the number of parallel processes may be less than the number of partitions; this is due to system limits or table attributes. This is not a problem since a parallel query server process can access multiple partitions. If you access only a single partition during the statement execution, the optimizer will understand not to perform the statement in parallel and will perform the statement serially. Inserts are parallelized during execution, as they will be divided among the parallel query server processes.
Look and Feel of Parallel
Query
The parallel query feature has a number of different
server processes that manage and execute the processing of SQL
statements. The first is the query coordinator (P000 process). This
process decides how to distribute the SQL statement among one or many
parallel query server processes. The other processes can be identified,
as they will appear as Pxxx in the process list. The values of 000 will
be the numerical identifier of the parallel query server process,
starting with 1 and continuing to the maximum number set by the
database configuration. Thus, a configuration with five parallel query
server processes will show processes P000, P001, P002, P003, and P004.
Configuration and installation of the parallel query feature requires
no intervention, since it is part of the base product. By setting the
appropriate parameters in your initialization parameter file, you can
utilize the parallel features of Oracle8.
The Initialization Parameter File
Parallel Style
The configuration of parallel query is a balance of
optimizing SQL and configuring the database so that it maximizes the
effectiveness of its parallel query server processes. In order to
optimize the database configuration, we will investigate some
initialization parameter file entries, and suggest how best to set
these values to optimize parallel SQL execution.
When the database starts, and the appropriate entries are set in the parameter file, Oracle will create a number of parallel query server processes that may be addressed by the query coordinator. These parallel processes become available for use in order to perform parallel operations. These processes, once assigned to an operation will be retained by the operation until its completion. Once completed, the operation will release the parallel query server process to be available to the next operation. In order to maximize these processes we must look at preparing the database. Let’s look at these parameters.
PARALLEL_MIN_SERVERS
This parameter specifies the minimum number of parallel
query server processes that will be initiated by the database at the
time of instance startup. To optimize the parallel query server
processes for normal database operations, the DBA should consider
setting the number of PARALLEL_MIN_SERVERS to the formula shown in the
next listing.
PARALLEL_MIN_SERVERS = the likely number of simultaneous parallel operations
By reviewing the information contained in the V$PQ_SYSSTAT data dictionary view, you can identify if the value you have set is too low or too high. The data you are interested in sits in the STATISTIC column with values shown in the next listing.
STATISTIC VALUE
------------------------------ ----------
Servers Busy 0
Servers Idle 0
You are looking for these values to indicate if we
have over-committed or under-committed our parallel query server
processes.
PARALLEL_MAX_SERVERS
This parameter specifies the maximum number of parallel
query server processes that will be spawned when required. At times,
when the volume of concurrent operations exceeds the number of current
parallel query processes currently running, the query coordinator will
start other parallel query server processes up to the number specified
in this parameter.
To optimize the parallel query server processes for normal database operations, you should consider setting the number of PARALLEL_MAX_SERVERS to the formula shown in the next listing. The formula is expanded to show the value for a two CPU machine with 30 concurrent users.
PARALLEL_MAX_SERVERS = 2 * # of CPUs * # of concurrent users = 2 * 4 * 30 = 240
When all parallel query server processes are in use and the maximum number of processes have been reached. The parallel query coordinator will react to the request for processes in excess of PARALLEL_MAX_SERVERS be either switching to serial processing or return an error if involved in a replication role.
OPTIMIZER_PERCENT_PARALLEL
This parameter determines how aggressively the
cost-based optimizer (CBO) will try to parallelize an operation.
By default the value is set to 0, so the optimizer will not consider
parallelization when determining the best execution plan. You will need
to decide how aggressive you want the optimizer to be when it comes to
determining the best balance between the execution and parallelization
of an operation. The higher that this value is set, up to a maximum of
100, the harder CBO will work to optimize parallel execution. This will
determine a plan to minimize execution time based on parallel
processing. The optimal setting for this value is shown in the next
listing.
OPTIMIZER_PERCENT_PARALLEL = 100/number of concurrent users
When determining if parallel execution is being performed, ensure that the value is set to 100, this will force the operation into a parallel plan unless a serial plan is faster. Remember that the lower the value is set, the optimizer will favor indexes, when the value is set higher the optimizer will favor full table scans.
PARALLEL_AUTOMATIC_TUNING
You can set this parameter to true. Oracle will then determine the
default values for parameters that control parallel execution. In
addition to setting this parameter, you must specify the PARALLEL
clause for the target tables in the system. Oracle then tunes all
subsequent parallel operations automatically.
SHARED_POOL_SIZE
The shared pool size must be reviewed. The parallel
processors to send messages back and forth to each other use the shared
pool. The reason for the increase in the pool is because parallel
operations require execution plans that require twice as much space as
serial plans. To optimize your shared pool we recommend that you
consider the following formula when determining your shared pool size;
our example uses a buffer size of 2k on a machine with a current shared
pool entry of 20,000,000, PARALLEL_MAX_SERVERS of 16, and 8 CPUs.
SHARED_POOL_SIZE =
current value + ((3 * message buffer size ) * (CPUs
+ 2) * PARALLEL_MAX_SERVERS)
= 20000000 + ((3 * 2048) * (8 + 2) * 16)
20000000 + 6144 * 10 * 16 = 20000000 + 983040 =
20983040
ALWAYS_ANTI_JOIN
This parameter is important when the SQL operation uses
the not in operator. By setting this parameter you can tell
your database to use parallel functionality when performing anti-joins.
By default not in is evaluated as a (sequential) correlated
subquery, when the parameter is set to NESTED_LOOPS. Instead Oracle
will perform a hash-join that will execute in parallel. To tell the
database to do parallel hashing set the parameter as follows:
ALWAYS_ANTI_JOIN = hash
The next SQL statement is an example of the familiar not in construct that the CBO with Oracle8 will react to with the ALWAYS_ANTI_JOIN entry set to HASH. Note the HASH_AJ hint.
select *
from individual
where lastname like 'KERZNER%'
and office_id is not null
and office_id not in
(select /*+ hash_aj */ office
from national_office
where office_id is not null
and role = 'SENATORS');
ROW_LOCKING
This parameter tells the database whether to acquire row
level locks during an update operation. The parameter should be set as
follows.
ROW_LOCKING = ALWAYS
By selecting ALWAYS or DEFAULT (they are the same), you tell then database to only to get row locks when the table is being updated. If your database is set to INTENT, then locks are acquired when you perform a select for update. This may appear to fine, but by setting the parameter to INTENT, all insert, updates and deletes will performed serially.
COMPATIBLE
This parameter tells the database to use the latest
features available within Oracle8. We mention here to remind you to set
the value to get the all the parallel features available. The parameter
should be set as shown in the next listing.
COMPATIBLE = 8.0.0
Parallel Execution
The execution of all SQL statements goes through the
same process, when the database has been configured with the parallel
query.
What determines the amount of parallelization is the existence of at least one full table scan or index scan in the execution plan. Oracle8 has now allowed us to utilize parallelism during operations that perform index range scans of a partitioned index and full index scans. To find out if your SQL is a candidate for parallelism, you may evaluate the SQL statement with explain plan.
CBO many times selects a full table scan for many SQL statements; full table scans are ideal candidates as they can greatly benefit from parallelization. Even when indexes are used, one can improve the performance by telling the coordinator to use parallel index processing. To tell the query coordinator to initiate parallel processing, you use hints as discussed in the next few sections.
PARALLEL Hint
The parallel hint (make sure you spell it
right!) allows you to tell the query coordinator how many concurrent
parallel query server processes should be used for the parallel
operation. The four main operations, select, insert, update,
and delete, can all take advantage of this hint. Remember that
you must identify the table to be parallelized at a minimum. If you do
not identify the degree of parallelism or the server instances, the
optimizer will set them to the default in the database. If any values
exceed those set in the database, the hint will be ignored. The next
listing is an example of how this hint is used.
select /*+ parallel (product,4) */ date_range,
product, unit_price
from product;
PARALLEL_INDEX Hint
To access information in a table via an index range
scans of a partitioned index, the parallel_index hint should be
used. This hint tells the query coordinator to spread the index scan
across multiple parallel query server processes. The next listing show
the format of this hint (for a table being read), where DOP stands for
degree of parallelism.
/*+ parallel_index (tablename, DOP, parallel server
instance split) */
select /*+ parallel_index (product,4) */ date_range,
product, unit_price
from product;
The format of the parallel_index hint is (for an index being scanned) is shown next.
/*+ parallel_index (indexname, DOP, parallel server
instance split) */
select /*+ parallel_index (date_range_index,4) */
date_range,
product, unit_price
from product;
NOPARALLEL Hint
When you do not want to use the parallel processing of a
select statement, the hint noparallel is available. This
hint disables any default parallel processing the query coordinator may
attempt to initiate. This hint is the same as saying: /*+ parallel
(tablename, 1,1) */. The format of this hint is shown in the next
listing.
/*+ noparallel (tablename) */
select /*+ noparallel */ date_range, product,
unit_price
from product;
Parallel SQL statements
The performance of information manipulation and
retrieval has always been and will continue to be an issue to everyone
who has had their computer tied up doing the endless query. You must
remember the update that started Monday and then ended when the
computer crashed three days later. Oracle7 and Oracle8 have provided us
with new strategies to improve the performance of our SQL statements.
Remember that SQL that is poorly formatted and configured will still
run poorly, but if the code is written efficiently then parallelism
will improve performance.
Manual parallel processing is a derivative of the parallel query feature. Looking back into our years of experience, well maybe it was last week at a major bank’s data warehouse project, parallelism helped achieve a goal. Start of "dream" sequence. We created six separate processes that addressed a key value range in a 10,000,000-row table, and started these six wonderfully written programs. This program then allowed the Oracle Server to distribute the processing among the multiple parallel query processes on the server. The moral of this dream—anyone can perform manual parallel processing. Was this the best solution? By spreading out the processes among the many processors, we were able to achieve the overall performance required for the task. The question then becomes what was the trade-off versus serial processing of the same task.
The greatest advantage to us was that we could get the transactions per second required to complete this task in our lifetime. Even with parallel hints, the programs would still not provide the performance required.
For every battle won there is a cost; based on the perceived and estimated cost, decisions need to be made. Let’s look at what influence decisions.
This leads to the question of what work is required to separate the information to a level where it can be easily manipulated. Oracle8 has almost removed this question from consideration. If you have been diligent and partitioned your data already, this becomes a non-issue. People out there who want to take advantage of partitions and the parallel nature of the information, ask themselves if the change worth the cost. The best answer we can give to you is that using the features that are now available in your database toolkit must always be considered. So should we implement this feature? If your machine supports parallelism, then this option must be considered.
What Is the Degree of
Parallelism?
When the database parallelizes an SQL statement, it
parallelizes this statement over a number of parallel query server
processes. The number of parallel query server processes used by an
operation is known as the degree of parallelism. Without the
degree of parallelism, the operation will be performed in serial. If
you had wanted that, you would not have purchased 12 new processors,
more memory and told everyone it was going to help. By understanding
the definition of the degree of parallelism, you can now empower
everyone in your organization to use all that new hardware and the new
and improved database schema when issuing SQL statements.
The degree of parallelism is defined at the statement level. This is done through the use of imbedded hints within the statement or by default within the database. At the time of table or index creation you can specify a default degree of parallelism for the object. By default this may either be the number of disks or CPUs.
There are two types of parallel operations that we
must define at this point as they will affect the degree of parallelism
that any operation can perform—intra-operations and inter-operations.
These two operations can be performed at the same time during statement
execution.
The Degree of Parallelism within Operations
The balance between too much parallelism and too little
is always a concern to us. Luckily we have a database that is smarter
than your average bear. The Oracle query coordinator is that bear. The
coordinator determines the degree of parallelism at runtime, based on
number of factors. First, the coordinator looks to see if they are
dealing with a smarter than normal programmer who has added hints to
their SQL statement. In the case that this has not been done, it will
check to see if your DBA was awake when the table or index was created.
The coordinator will then look for the object’s default degree of
parallelism. Once determined, this degree of parallelism will be used
by the operation. If you create a table remember to follow the
following format, with the parallel-specific section bolded.
SQL create table individual (
2 firstname varchar2(20) not null,
3 lastname varchar2(30) not null,
4 birthdate date not null,
5 gender varchar2(1) not null,
6 initial varchar2(1),
7 favorite_beatle varchar2(6))
8 parallel (degree 4);
Table created.
SQL
The coordinator will only request the number of parallel query server processes defined by the degree of parallelism. The physical number of processes that the coordinator will be able to get will depend upon how many are available at the time of operation execution.
So now we know how and where the database gets its degree of parallelism, but at some point we will need to tell what degree of parallelism to use. Therefore it is necessary to understand how best to tell Oracle what the degree of parallelism is defined for a table, index or hint. By following these rules and selecting the ones that relate to your specific operation, you can define your degree of parallelism, just like the pros.
Suppose you have 8 CPUs, but your information is stored on 12 disks, the limiting factor here is CPUs. The degree of parallelism that we should use would be 8.
Although you may request eight parallel query server processes to process your statement, you may is not receive this many. Due to limits placed on the database, there may not be a sufficient number of parallel query servers available. If you have defined 25 PARALLEL_MAX_SERVERS, then you can only have 25 parallel processes running. If you exceed this number Oracle will not spawn additional processes, resulting in some statements being executed in serial mode by the query coordinator. In the case when you have specified the PARALLEL_MIN_PERCENT and the minimum fraction of parallelism is not available, you will receive an error during execution.
By defining the degree of parallelism for an SQL operation you can profoundly affect the processing of your information. Specify it too high and you may not get processes that you require when you want them. If the value is too low, you may find that your parallel query server processes are not being maximized and result in lost performance.
When considering your degree of parallelism, you should by default define it such that it optimizes a majority of operations to maximize the utilization of your I/O and CPU. In some cases this may not be the best approach, and you may want to override the default degree of parallelism. For example if your operation requires large amounts of I/O you may want to increase the degree of parallelism for the operation. If your operation requires a large amount of memory, then you may want to decrease the degree of parallelism. To override the default degree of parallelism, you may decide to incorporate hints into your SQL.
Now that we can define our degree of parallelism,
we need to be able to use our knowledge, but where do we use these
facts. By using parallelism in your SQL statements, you can increase
the performance of many SQL statements. Let’s move on to looking at
some familiar SQL statements, and how they are parallelized.
Summary
To summarize parallelism, its features, how to leverage
its power, and using it in real-life situations, inspect the following
points as a summary of what we have covered in this presentation.
For a more complete discussion of parallelism,
please read Oracle8 Tuning by Abbey, Corey, Dechichio and Abramson,
printed by the Oracle Press 1997. Or the Followinf ML Notes:
237287.1 How To Verify Parallel Execution is running