Welcome to the Parallel Query Dimension of Oracle

Paper #200
Ian Abramson
Ian Abramson Systems Inc
Toronto, ON, Canada

Parallelisms, symmetry, equivalence—these words and concepts all have something in common. They demonstrate how the total exceeds the sum of its parts. Goalies may be able to stop every shot aimed at them, but without the rest of the team, a goalie is powerless to win without a great deal of luck. In the world of computers, this concept is applicable in terms of parallelism. Parallel processing allows great computer with a whole team of CPUs, to work together to create an even greater computational power. Parallel processing has existed for many years, but with Oracle8 the power of multiple processors and servers takes advantage of a computer topology that can further empower the database to increase application and server productivity.

The question is: What do I need to be able to take advantage of these parallelism features incorporated within Oracle8? First you need a computer, but will any computer do? Possibly, as long as the computer can support multiple CPUs and you have bought, configured and installed these CPUs, you have the opportunity to enter into the parallel query world. In today’s world this means that you would require a machine and operating system that supports two or more processors, and an operating system that can address and share processing amongst these processors. These machines share memory and disks, but spread their processing power amongst the numerous CPUs. The reason that parallel process provides us with so much of a performance boost is that we often run into processing bottlenecks, not I/O bottlenecks. By implementing solutions that use the Parallel Query feature, you will start to experience new problems. You may then find that I/O is a bottleneck, since the CPUs will now be hungry for more information at a rate that the system had not experienced during serial operation execution. In this presentation we will discuss how best to use the parallel query feature, how best to implement it, and what to do to ensure that you are using this feature at an optimal level.

What is the Parallel Query Option.
The Parallel Query Option is a mechanism that allows a large query to be split up (transparently to the end-user) into a number of smaller queries that can all be run simultaneously. The Parallel Query Option (PQO) should not be confused with the Oracle Parallel Server (OPS) which allows multiple instances of Oracle (usually running on a number of separate machines) to address a single set of database files and uses a high-speed ‘distributed lock manager’ to synchronize the ownership of blocks and locks across the separate SGA’s

Init.ora Parameters
To use PQO, you need to have a number of ‘floating’ processes available in the system that can be called by your normal shadow process (dedicated server). These processes, called the ‘parallel query slaves’ will have names like ora_p000_{SID}, and you can have up to 256 of them.

The parameter parallel_min_servers specifies the number of such processes that you want permanently running and waiting for calls.  Since such processes take up machine resources you may want to keep this number low for normal running:  if you set parallel_max_servers Oracle will then spawn further processes on demand up to this limit.  One odd feature of dynamically spawned processes, however, is that they are owned at the O/S level by the user running the session’s shadow process, and not by any special ‘Oracle id’.

Once you have the capability of running parallel queries, there are two other parameters that you should consider.  These relate to the way the system performance will degrade as the load increases.

Consider the problem of a query that takes 3 minutes to complete when running with a degree of parallelism of 15.  What happens if the database is so busy that no parallel slaves are free when someone starts the query.  Is it acceptable for the query to run serially and take 45 minutes, or should it simply die with a warning message?

Set parallel_min_percent to a number between 0 and 100, and if Oracle is unable to get that percentage of the demanded degree of parallelism it returns an error (ORA-12827 insufficient parallel query slaves available).  For example, you want a tablescan at degree 6.  You have parallel_min_percent set to 50.  If the query can run at least at degree 3 with the current number of slaves available it will run, but it there are insufficient slaves available for running at degree 3 the error is raised.

Associated with this parameter is optimizer_percent_parallel which affects the assumptions that the optimizer makes about the degree of parallelism at which a query is expected to run. 

Imagine a query that works best at a high degree of parallelism (20 say) by using hash-joins, but gives an acceptable performance at a lower level of parallelism (5 to 15 say) by switching to nested loops.  If the optimizer decides that the default degree of the query is 20, then it will always generate the hashing path, even when the system is busy and there are not enough free slaves around to run the query at degree 20.  Setting this parameter to a number between 0 and 100, however, tells the optimizer to reduce the assumed degree of parallelism - setting it to 50 in our example would make the optimizer choose the path that was appropriate to degree 10, hence taking the nested loop path. 

Invoking PQO
After setting your parameters and restarting your database, what makes it happen?
The first possibility is to give a table an explicit (or implicit) degree of parallelism:

alter table PX parallel (degree 4);
alter table PX parallel (degree default);

Whenever a query involves this table, the optimizer will try to find a path that uses parallel techniques on the table. 
Two points that are specific to 7.3.2:

First, the Cost Based Optimizer will always take over in queries where there are parallel tables available even if the current optimizer goal is Rule.  Second, the default keyword translates into ‘the smaller of the number of CPUs and the number of devices across which the table appears to be spread’. 

The second possibility for introducing parallelism into a query is to put an explicit hint into it:
select /*+ parallel (px, 4)*/
count(*) from PX;

select /*+ parallel (px, default) */
count(*) from PX;

The third option is to define views that contain hints, so that application code does not have to worry about degrees of parallelism,
create or replace view big_px as
select /*+ parallel (px1, 4) parallel (px2, 6) */
from px1, px2
where px1.id = px2.x_id;

An interesting, and possibly irritating, feature of the view definition above is the question of what happens when there are several parallel tables in a query. In general it is not possible to predict the degree of parallelism that such a query will use, as it depends on the usage, the driving table, and the join paths used.  In the view above I could only say that, if no other parallel tables are introduced, the final query will run with degree 1, 4 or 6.

When to use it:
Typical queries may need to retrieve and crunch a large number of rows to return a small result set.  The Parallel Query Option can be very effective in reducing the time for such queries - on the plus side you have the options of increasing parallel throughput on disks and breaking up the sort requirements to avoid sorts to disk - on the minus side you need to consider the risk of unsuitable data distributions that result in excessive communication and sort costs.

Small Parent Scan to Large Child:
The obvious use of this option is with the recently introduced HASH JOIN facility where two tables are scanned and the smaller is hashed to be used as a target for the larger and most of the benefit comes from the ease of separating the I/O on tablescanning.  However another option (particularly of use in conjunction with partition views) is in scanning a small table and using indexed accesses into a large table where the physical data distribution is suitably packed.

Object Creation:
When running in parallel, particularly with the unrecoverable option, you may find that the time to create an object (table or index) is significantly reduced.  But there are a couple of side-effects (see further down) to worry about when chasing this benefit.


Parallel Query Processing

The processing of information using multiple processors requires the Oracle8 Parallel Query feature. Parallel Query allows SQL statements to utilize share processing across these processors simultaneously. When processing SQL on a machine that has only a single processor, all SQL will execute within a single process. Parallel Query allows statements to be divided and utilize multiple processes, resulting in quicker completion of the statement.

This discussion is limited to the Parallel Query and it does not include the use and configuration of the Parallel Server Option. The Parallel Server Option relates to the linking of multiple database servers together, whereas the Parallel Query option can be implemented on a single server that contains multiple processors.

Parallel Query provides significant performance gains for databases that contain large amounts of data. The types of systems that gain the most from parallel operations include:

Generally, the parallel query feature is most useful when the queries require a great deal of time to complete, and when a large number of rows are processed. These database applications share a number of common characteristics. The most common characteristic is size. These applications are information intensive. The amount of storage required is very large, to maximize the retrieval of the information the parallel query feature spreads the SQL among the active CPUs. This load sharing is performed by efficiently splitting up a request over the many processors running on the system. As noted previously, by working as a team the goal of completing an operation can be reached sooner and require a smaller amount of effort.

At execution time Oracle, along with the server’s multiple processors, work together to distribute the database operation statement. Interestingly enough, the splitting of work by the parallel query engine is dynamic; if there is any changes to the server’s configuration, it will be adapted to by Oracle at the time of the statement request.

It is important to note that if your system is already heavily loaded that the gains expected by parallel query will be reduced. Ensure that your server has the available cycles for implementing a large information intensive database. Also verify that you have optimized your current CPU usage, disks and disk controllers before embarking on a parallelism initiative.

The SQL statements and database functions that benefit from parallelization with parallel query are the following:

The above statements will take advantage of parallel query through the proper use of these statements. Together with the configuration of the database, it will all provide results in a more timely manner.

Oracle will parallelize operations in the following ways:

The database parallelizes the SQL statement at execution time. At this time it divides the table or index into ranges of database blocks and then executes on these ranges in parallel. This processing on the ranges are performed by ROWID and each statement will access the information in parallel for a high and low range of ROWIDs. In the case of partitioned data, the information is not accessed by this range, but by the defined partitions. The information is queried by a set of ROWIDs from within each partition and no scan can overlap two partitions.

Partitions are excellent candidates for parallel execution as these data sets can be easily divided into smaller working groups to allow for efficient information interrogation. Basically each partition becomes a candidate for assignment to an individual parallel query server process. In some cases, the number of parallel processes may be less than the number of partitions; this is due to system limits or table attributes. This is not a problem since a parallel query server process can access multiple partitions. If you access only a single partition during the statement execution, the optimizer will understand not to perform the statement in parallel and will perform the statement serially. Inserts are parallelized during execution, as they will be divided among the parallel query server processes.

Look and Feel of Parallel Query
The parallel query feature has a number of different server processes that manage and execute the processing of SQL statements. The first is the query coordinator (P000 process). This process decides how to distribute the SQL statement among one or many parallel query server processes. The other processes can be identified, as they will appear as Pxxx in the process list. The values of 000 will be the numerical identifier of the parallel query server process, starting with 1 and continuing to the maximum number set by the database configuration. Thus, a configuration with five parallel query server processes will show processes P000, P001, P002, P003, and P004. Configuration and installation of the parallel query feature requires no intervention, since it is part of the base product. By setting the appropriate parameters in your initialization parameter file, you can utilize the parallel features of Oracle8.

The Initialization Parameter File Parallel Style
The configuration of parallel query is a balance of optimizing SQL and configuring the database so that it maximizes the effectiveness of its parallel query server processes. In order to optimize the database configuration, we will investigate some initialization parameter file entries, and suggest how best to set these values to optimize parallel SQL execution.

When the database starts, and the appropriate entries are set in the parameter file, Oracle will create a number of parallel query server processes that may be addressed by the query coordinator. These parallel processes become available for use in order to perform parallel operations. These processes, once assigned to an operation will be retained by the operation until its completion. Once completed, the operation will release the parallel query server process to be available to the next operation. In order to maximize these processes we must look at preparing the database. Let’s look at these parameters.

PARALLEL_MIN_SERVERS
This parameter specifies the minimum number of parallel query server processes that will be initiated by the database at the time of instance startup. To optimize the parallel query server processes for normal database operations, the DBA should consider setting the number of PARALLEL_MIN_SERVERS to the formula shown in the next listing.

PARALLEL_MIN_SERVERS = the likely number of simultaneous parallel operations

By reviewing the information contained in the V$PQ_SYSSTAT data dictionary view, you can identify if the value you have set is too low or too high. The data you are interested in sits in the STATISTIC column with values shown in the next listing.

STATISTIC VALUE
------------------------------ ----------
Servers Busy 0
Servers Idle 0

You are looking for these values to indicate if we have over-committed or under-committed our parallel query server processes.
 

PARALLEL_MAX_SERVERS
This parameter specifies the maximum number of parallel query server processes that will be spawned when required. At times, when the volume of concurrent operations exceeds the number of current parallel query processes currently running, the query coordinator will start other parallel query server processes up to the number specified in this parameter.

To optimize the parallel query server processes for normal database operations, you should consider setting the number of PARALLEL_MAX_SERVERS to the formula shown in the next listing. The formula is expanded to show the value for a two CPU machine with 30 concurrent users.

PARALLEL_MAX_SERVERS = 2 * # of CPUs * # of concurrent users = 2 * 4 * 30 = 240

When all parallel query server processes are in use and the maximum number of processes have been reached. The parallel query coordinator will react to the request for processes in excess of PARALLEL_MAX_SERVERS be either switching to serial processing or return an error if involved in a replication role.


OPTIMIZER_PERCENT_PARALLEL
This parameter determines how aggressively the cost-based optimizer (CBO) will try to parallelize an operation. By default the value is set to 0, so the optimizer will not consider parallelization when determining the best execution plan. You will need to decide how aggressive you want the optimizer to be when it comes to determining the best balance between the execution and parallelization of an operation. The higher that this value is set, up to a maximum of 100, the harder CBO will work to optimize parallel execution. This will determine a plan to minimize execution time based on parallel processing. The optimal setting for this value is shown in the next listing.

OPTIMIZER_PERCENT_PARALLEL = 100/number of concurrent users

When determining if parallel execution is being performed, ensure that the value is set to 100, this will force the operation into a parallel plan unless a serial plan is faster. Remember that the lower the value is set, the optimizer will favor indexes, when the value is set higher the optimizer will favor full table scans.

PARALLEL_AUTOMATIC_TUNING
You can set this parameter to true. Oracle will then determine the default values for parameters that control parallel execution. In addition to setting this parameter, you must specify the PARALLEL clause for the target tables in the system. Oracle then tunes all subsequent parallel operations automatically.
 

SHARED_POOL_SIZE
The shared pool size must be reviewed. The parallel processors to send messages back and forth to each other use the shared pool. The reason for the increase in the pool is because parallel operations require execution plans that require twice as much space as serial plans. To optimize your shared pool we recommend that you consider the following formula when determining your shared pool size; our example uses a buffer size of 2k on a machine with a current shared pool entry of 20,000,000, PARALLEL_MAX_SERVERS of 16, and 8 CPUs.

ALWAYS_ANTI_JOIN
This parameter is important when the SQL operation uses the not in operator. By setting this parameter you can tell your database to use parallel functionality when performing anti-joins. By default not in is evaluated as a (sequential) correlated subquery, when the parameter is set to NESTED_LOOPS. Instead Oracle will perform a hash-join that will execute in parallel. To tell the database to do parallel hashing set the parameter as follows:

ALWAYS_ANTI_JOIN = hash

The next SQL statement is an example of the familiar not in construct that the CBO with Oracle8 will react to with the ALWAYS_ANTI_JOIN entry set to HASH. Note the HASH_AJ hint.

select *
from individual
where lastname like 'KERZNER%'
and office_id is not null
and office_id not in
(select /*+ hash_aj */ office
from national_office
where office_id is not null
and role = 'SENATORS');

ROW_LOCKING
This parameter tells the database whether to acquire row level locks during an update operation. The parameter should be set as follows.

ROW_LOCKING = ALWAYS

By selecting ALWAYS or DEFAULT (they are the same), you tell then database to only to get row locks when the table is being updated. If your database is set to INTENT, then locks are acquired when you perform a select for update. This may appear to fine, but by setting the parameter to INTENT, all insert, updates and deletes will performed serially.

COMPATIBLE
This parameter tells the database to use the latest features available within Oracle8. We mention here to remind you to set the value to get the all the parallel features available. The parameter should be set as shown in the next listing.

COMPATIBLE = 8.0.0

Parallel Execution
The execution of all SQL statements goes through the same process, when the database has been configured with the parallel query.

  1. The statement is parsed.
  2. If the statement is a candidate for parallelization, the optimizer determines the execution plan of the statement.
  3. The query coordinator determines the method the parallelization of the operation.

What determines the amount of parallelization is the existence of at least one full table scan or index scan in the execution plan. Oracle8 has now allowed us to utilize parallelism during operations that perform index range scans of a partitioned index and full index scans. To find out if your SQL is a candidate for parallelism, you may evaluate the SQL statement with explain plan.

CBO many times selects a full table scan for many SQL statements; full table scans are ideal candidates as they can greatly benefit from parallelization. Even when indexes are used, one can improve the performance by telling the coordinator to use parallel index processing. To tell the query coordinator to initiate parallel processing, you use hints as discussed in the next few sections.

PARALLEL Hint
The parallel hint (make sure you spell it right!) allows you to tell the query coordinator how many concurrent parallel query server processes should be used for the parallel operation. The four main operations, select, insert, update, and delete, can all take advantage of this hint. Remember that you must identify the table to be parallelized at a minimum. If you do not identify the degree of parallelism or the server instances, the optimizer will set them to the default in the database. If any values exceed those set in the database, the hint will be ignored. The next listing is an example of how this hint is used.

select /*+ parallel (product,4) */ date_range, product, unit_price
from product;

PARALLEL_INDEX Hint
To access information in a table via an index range scans of a partitioned index, the parallel_index hint should be used. This hint tells the query coordinator to spread the index scan across multiple parallel query server processes. The next listing show the format of this hint (for a table being read), where DOP stands for degree of parallelism.

/*+ parallel_index (tablename, DOP, parallel server instance split) */
select /*+ parallel_index (product,4) */ date_range, product, unit_price
from product;

The format of the parallel_index hint is (for an index being scanned) is shown next.

/*+ parallel_index (indexname, DOP, parallel server instance split) */
select /*+ parallel_index (date_range_index,4) */ date_range,
product, unit_price
from product;

NOPARALLEL Hint
When you do not want to use the parallel processing of a select statement, the hint noparallel is available. This hint disables any default parallel processing the query coordinator may attempt to initiate. This hint is the same as saying: /*+ parallel (tablename, 1,1) */. The format of this hint is shown in the next listing.

/*+ noparallel (tablename) */

select /*+ noparallel */ date_range, product, unit_price
from product;

Parallel SQL statements
The performance of information manipulation and retrieval has always been and will continue to be an issue to everyone who has had their computer tied up doing the endless query. You must remember the update that started Monday and then ended when the computer crashed three days later. Oracle7 and Oracle8 have provided us with new strategies to improve the performance of our SQL statements. Remember that SQL that is poorly formatted and configured will still run poorly, but if the code is written efficiently then parallelism will improve performance.

Manual parallel processing is a derivative of the parallel query feature. Looking back into our years of experience, well maybe it was last week at a major bank’s data warehouse project, parallelism helped achieve a goal. Start of "dream" sequence. We created six separate processes that addressed a key value range in a 10,000,000-row table, and started these six wonderfully written programs. This program then allowed the Oracle Server to distribute the processing among the multiple parallel query processes on the server. The moral of this dream—anyone can perform manual parallel processing. Was this the best solution? By spreading out the processes among the many processors, we were able to achieve the overall performance required for the task. The question then becomes what was the trade-off versus serial processing of the same task.

The greatest advantage to us was that we could get the transactions per second required to complete this task in our lifetime. Even with parallel hints, the programs would still not provide the performance required.

For every battle won there is a cost; based on the perceived and estimated cost, decisions need to be made. Let’s look at what influence decisions.

This leads to the question of what work is required to separate the information to a level where it can be easily manipulated. Oracle8 has almost removed this question from consideration. If you have been diligent and partitioned your data already, this becomes a non-issue. People out there who want to take advantage of partitions and the parallel nature of the information, ask themselves if the change worth the cost. The best answer we can give to you is that using the features that are now available in your database toolkit must always be considered. So should we implement this feature? If your machine supports parallelism, then this option must be considered.

What Is the Degree of Parallelism?
When the database parallelizes an SQL statement, it parallelizes this statement over a number of parallel query server processes. The number of parallel query server processes used by an operation is known as the degree of parallelism. Without the degree of parallelism, the operation will be performed in serial. If you had wanted that, you would not have purchased 12 new processors, more memory and told everyone it was going to help. By understanding the definition of the degree of parallelism, you can now empower everyone in your organization to use all that new hardware and the new and improved database schema when issuing SQL statements.

The degree of parallelism is defined at the statement level. This is done through the use of imbedded hints within the statement or by default within the database. At the time of table or index creation you can specify a default degree of parallelism for the object. By default this may either be the number of disks or CPUs.

There are two types of parallel operations that we must define at this point as they will affect the degree of parallelism that any operation can perform—intra-operations and inter-operations. These two operations can be performed at the same time during statement execution.
 

The Degree of Parallelism within Operations
The balance between too much parallelism and too little is always a concern to us. Luckily we have a database that is smarter than your average bear. The Oracle query coordinator is that bear. The coordinator determines the degree of parallelism at runtime, based on number of factors. First, the coordinator looks to see if they are dealing with a smarter than normal programmer who has added hints to their SQL statement. In the case that this has not been done, it will check to see if your DBA was awake when the table or index was created. The coordinator will then look for the object’s default degree of parallelism. Once determined, this degree of parallelism will be used by the operation. If you create a table remember to follow the following format, with the parallel-specific section bolded.

SQL create table individual (
2 firstname varchar2(20) not null,
3 lastname varchar2(30) not null,
4 birthdate date not null,
5 gender varchar2(1) not null,
6 initial varchar2(1),
7 favorite_beatle varchar2(6))
8 parallel (degree 4);
Table created.
SQL

The coordinator will only request the number of parallel query server processes defined by the degree of parallelism. The physical number of processes that the coordinator will be able to get will depend upon how many are available at the time of operation execution.

So now we know how and where the database gets its degree of parallelism, but at some point we will need to tell what degree of parallelism to use. Therefore it is necessary to understand how best to tell Oracle what the degree of parallelism is defined for a table, index or hint. By following these rules and selecting the ones that relate to your specific operation, you can define your degree of parallelism, just like the pros.

Suppose you have 8 CPUs, but your information is stored on 12 disks, the limiting factor here is CPUs. The degree of parallelism that we should use would be 8.

Although you may request eight parallel query server processes to process your statement, you may is not receive this many. Due to limits placed on the database, there may not be a sufficient number of parallel query servers available. If you have defined 25 PARALLEL_MAX_SERVERS, then you can only have 25 parallel processes running. If you exceed this number Oracle will not spawn additional processes, resulting in some statements being executed in serial mode by the query coordinator. In the case when you have specified the PARALLEL_MIN_PERCENT and the minimum fraction of parallelism is not available, you will receive an error during execution.

By defining the degree of parallelism for an SQL operation you can profoundly affect the processing of your information. Specify it too high and you may not get processes that you require when you want them. If the value is too low, you may find that your parallel query server processes are not being maximized and result in lost performance.

When considering your degree of parallelism, you should by default define it such that it optimizes a majority of operations to maximize the utilization of your I/O and CPU. In some cases this may not be the best approach, and you may want to override the default degree of parallelism. For example if your operation requires large amounts of I/O you may want to increase the degree of parallelism for the operation. If your operation requires a large amount of memory, then you may want to decrease the degree of parallelism. To override the default degree of parallelism, you may decide to incorporate hints into your SQL.

Now that we can define our degree of parallelism, we need to be able to use our knowledge, but where do we use these facts. By using parallelism in your SQL statements, you can increase the performance of many SQL statements. Let’s move on to looking at some familiar SQL statements, and how they are parallelized.
 
 

Summary
To summarize parallelism, its features, how to leverage its power, and using it in real-life situations, inspect the following points as a summary of what we have covered in this presentation.

For a more complete discussion of parallelism, please read Oracle8 Tuning by Abbey, Corey, Dechichio and Abramson, printed by the Oracle Press 1997. Or the Followinf ML Notes:
237287.1 How To Verify Parallel Execution is running