All about Statistics

Intruduction on Statistics
Important Initialization Parameters
Gathering Statistics for Tables
Gathering Statistics for a Schema, a Database and System Stats
Important Notes
Verifying Table Statistics
Verifying Index Statistics
Verifying Column Statistics
Exporting and Importing Statistics
Using Histograms
Tuning by Statistics


Intruduction on Statistics
The CBO makes its explain-plan decisions based on statistics.
Statistics provide critical input in order for CBO to work properly; these includes information such as the number of rows in a table, distinct values in a column, number of leaf blocks in an index, etc.

Statistics may be exact or estimated:
Statistics are stored in a data dictionary tables owned by SYS user. The following views display the statistics collected for tables, columns and indexes.
For Tables: DBA_TABLES
NUM_ROWS - Number of rows
BLOCKS - Number of blocks below HW (used or not)
EMPTY_BLOCKS - Number of empty blocks
AVG_SPACE - Average free space (in bytes) in blocks allocated to the table. All empty and free blocks are considered for this
CHAIN_CNT - Number of chained rows (used by ANALYZE command only)
AVG_ROW_LEN - Average row length in bytes
LAST_ANALYZED - Date when the table was last analyzed
SAMPLE_SIZE - Sample size provided for ESTIMATE statistics. Equal to NUM_ROWS if COMPUTE
GLOBAL_STATS - For partitioned tables, YES - statistics collected as a whole, NO - statistics are estimated from statistics
USER_STATS - Set to YES if user has explicitly set the statistics for the table
Statistics for individual partitions of a table can be seen from DBA_TAB_PARTITIONS. Cluster statistics is available from DBA_CLUSTERS.

For Indexes DBA_INDEXES
BLEVEL - Depth of the index, from root to leaf
LEAF_BLOCKS - Number of leaf blocks
DISTINCT KEYS - Number of distinct index values
AVG_LEAF_BLOCKS_PER_KEY - Average number of leaf blocks in which each distinct key appears, should be 1 for unique indexes
AVG_DATA_BLOCKS_PER_KEY - Average number of blocks in the table that are pointed to by a distinct key
CLUSTERING_FACTOR - A count that determines the ordering of the index. Index is ordered if count is closer to the number of blocks, i.e entries in single leaf tend to point to rows in same blocks in the table. Index is randomly ordered if closer to the number of rows, i.e., entries in single leaf are pointing to rows spread across multiple blocks
NUM_ROWS - Number of rows indexed
SAMPLE_SIZE - Sample size provided for ESTIMATE statistics. Equal to NUM_ROWS if COMPUTE
LAST_ANALYZED - Date when the table was last analyzed
GLOBAL_STATS - For partitioned indexes, YES - statistics collected as a whole, NO - statistics are estimated from statistics
USER_STATS - Set to YES if user has explicitly set the statistics for the index
PCT_DIRECT_ACCESS - For secondary indexes on IOTs, percentage of rows with valid guess
Statistics for individual partitions of indexes can be seen from DBA_IND_PARTITIONS and DBA_IND_SUBPARTITIONS.

For Columns: DBA_TAB_COLUMNS
NUM_DISTINCT - Number of distinct values
LOW_VALUE - Lowest value
HIGH_VALUE - Highest value
DENSITY - Density of the column
NUM_NULLS - Number of records with null value for the concerned column
NUM_BUCKETS - Number of buckets in histograms. Refer Histograms section
SAMPLE_SIZE - Sample size provided for ESTIMATE statistics. Equal to total rows if COMPUTE
LAST_ANALYZED - Date when the table was last analyzed
DBA_TAB_COL_STATISTICS shows similar data. Partitioned Table column statistics can be viewed from DBA_PART_COL_STATISTICS and DBA_SUBPART_COL_STATISTICS


Important Initialization parameters that affect the CBO
Apart from generating statistics, the setup of the below mentioned parameters play a very crucial role in dictating how your system will work. Setting these will depend a lot on what kind of environment you want to create, Online, Batch processing, Data warehousing or a combination of more than one.

1) OPTIMIZER_MODE
This will decide the mode in which the optimizer engine should run in. Valid values are RULE, CHOOSE, ALL_ROWS, FIRST_ROWS (_n).
Optionally, the OPTIMIZER_MODE can be set to CHOOSE. This is kind of an intermediate option between RBO and CBO. In fact, it tries to run the query in either CBO or RBO depending on the availability or unavailability of statistics. Therefore, if the tables present in the query have statistics generated on them, CBO (ALL_ROWS only) is preferred or RBO is taken up.
e.g.: optimizer_mode = first_rows
For an OLTP system, FIRST_ROWS would be the ideal option for fast response time. ALL_ROWS is meant for batch-oriented applications. Note, a plan that produces the first n rows with the fastest response time might not be an optimal plan if requirement is to obtain the entire result, so decide as per the need of the application

2) OPTIMIZER_INDEX_COST_ADJ
Optimizer_Index_Cost_Ad takes a value between 1 and 10000 (default 100). Setting this value makes the index more or less friendly to the optimizer. For example setting it to 50 tells the optimizer that an execution plan that uses index(es) is half as expensive as other execution plans that are not using them. The lower the value (less than 100), the less full table scan executions will take place in the system.
Use of indexes in OLTP is a crucial factor for deciding the response time, but sometimes, full table scans provide better throughput. Nested loop joins are influenced by this parameter value. Set this parameter to a value that is optimal for your application. I recommend set this parameter between 10 - 20 for OLTP and 50 for DSS Systems..
If you are having slow performance because the CBO first_rows optimizer mode is favoring too many full-table scans, you can reset the optimizer_index_cost_adj parameter to immediately tune all of the SQL in your database to favor index scans over full-table scans. This is a "silver bullet" that can improve the performance of an entire database in cases where the database is OTLP and you have verified that the full-table scan costing is too low.

3) OPTIMIZER_INDEX_CACHING
This tells optimizer to favor nested loops and IN-list iterators over sort-merge and hash joins. The default value is 100 and makes nested loops and IN-list operator look less expensive and more appealing to Optimizer. The value basically indicates the percentage of index blocks that the optimizer should assume are present in cache.
I recommend set this parameter to 85.

4) DB_FILE_MULTIBLOCK_READ_COUNT
This parameter determines the number of database blocks read in one input/output operation during a full table scan. The value set is used in computing the cost of full table scans. OLTP systems will not benefit by setting a high value for this parameter as the computed cost for full table scan execution plans would reduced. The maximum size is Operating system dependent.
e.g.: db_file_multiblock_read_count = 1 (may be set to a higher value)

5) SORT_AREA_SIZE
This parameter defines the maximum memory space allocated per user process that requires sorting of data and for insert and updates to bitmap indexes. Optimizer uses the set value for determining the cost of sorting in a query. This defaults to 64K. Normally a size of 64K to 1M is appropriate for OLTP systems.
Sort-merge joins are influenced by this parameter value. The bigger the size the more appealing will be sort-merge joins over nested-loop and hash joins. The lower the size the more sorting will take place in temporary segments. So assign a value that has been evaluated for your setup.
Oracle recommends the use of PGA_AGGREGATE_TARGET instead of this parameter from Oracle 9i.

6) SORT_MULTIBLOCK_READ_COUNT
This specifies the number of database blocks to be read each time a sort performs a read from a temporary segment. The default value is 2. General recommendation is to keep the default value. However, if you intend to hike up the SORT_AREA_SIZE considerably, you may evaluate the improvement by changing this parameter.
e.g.: sort_multiblock_read_count = 2

7) HASH_JOIN_ENABLED
Hash joins are available only in CBO. In hash joins, a hash table is created on the join key of the smallest sized table. It then joins the other tables to find the match. Hash joins may prove to be faster than other type of joins in some conditions, especially when the index is missing or search criteria is not very selective. Hash joins require a large amount of memory as the hash tables are retained; this may sometimes result in memory swapping.
Nested-loop joins return the first row faster than sort-merge and hash joins and are preferred for OLTP, but other types of joins cannot be ignored for running other aspects of the applications.
e.g.: hash_join_enabled = true

8) HASH_AREA_SIZE
This specifies the maximum amount of memory in bytes to be used for a hash join per process. It is defaulted to 2 times SORT_AREA_SIZE.
Oracle recommends the use of PGA_AGGREGATE_TARGET instead of this parameter from Oracle 9i.
e.g.: hash_area_size = 2097152
Setting this to a very low number may sometimes result in the following error.
ORA-6580: Hash Join ran out of memory while keeping large rows in memory.

9) HASH_MULTIBLOCK_IO_COUNT
This specifies how many sequential blocks a hash join reads and writes in one Input/Output activity. Oracle recommends not changing or assigning a value to this parameter; this will let oracle decide on the appropriate value for each individual query. In such casees, the value of the parameter will appear as 0 in the V$PARAMETER view.
This parameter is renamed to _HASH_MULTIBLOCK_IO_COUNT as of Oracle 9i.
e.g.: hash_multi_block_io_count = 0

10) BITMAP_MERGE_AREA_SIZE
This parameter is relevant for systems using bitmap indexes. It specifies the amount of memory Oracle uses to merge bitmaps retrieved from a range scan of a bitmap index. The default value is 1 MB, which is considered sufficient for most setups.
Oracle recommends use of PGA_AGGREGATE_TARGET instead of this parameter from Oracle 9i.
e.g.: bitmap_merge_area_size = 1048576

11) QUERY_REWRITE_ENABLED
This parameter is relevant for systems using Materialized views, Function based indexes and stored outlines. Setting this parameter enables query rewrite for the database. The materialized views should also be query rewrite enabled to allow the optimizer to redirect a query to it rather than the actual tables listed in the FROM clause. Query rewriting is internally done by the optimizer based on what results are expected and whether these are already present in materialized view form.
e.g.: query_rewrite_enabled = true

12) QUERY_REWRITE_INTEGRITY
This parameter determines the extent to which the optimizer must enforce query rewriting to Materialized views. This determines the accuracy of the query rewrite. It can be set to ENFORCED, TRUSTED or STALE_TOLERATED. ENFORCED option uses Materialized views if they contain fresh data, thus guarantying the integrity and consistency of data. TRUSTED uses relationships that have been defined in the system for rewriting the query. STALE_TOLERATED allows the use of relationships that are not enforced and may use stale data. For OLTP systems, ENFORCED would be the ideal setup, as reports need to be up-to-date.
e.g.: query_rewrite_integrity = enforced

13) ALWAYS_ANTI_JOIN
This parameter specifies the join method for anti-joins, for example when a NOT IN operator is present in your query. It can be set to NESTED_LOOPS, MERGE or HASH. It is defaulted to NESTED_LOOPS in Oracle 8i and CHOOSE in Oracle 9i.
This parameter is renamed to _ALWAYS_ANTI_JOIN as of Oracle 9i.
e.g.: always_anti_join = nested_loops

14) ALWAYS_SEMI_JOIN
This parameter specifies the join method for semi-joins. These types of joins are carried out by Optimizer after transforming a query. In such joins, duplicate values from the inner table are removed and then the type of join specified in the parameter is used to perform a semi-join. It can be set to NESTED_LOOPS, MERGE or HASH. In Oracle 8i, it is defaulted to STANDARD and in Oracle 9i it is defaulted to CHOOSE, to pick up an appropriate join.
This parameter is renamed to _ALWAYS_SEMI_JOIN as of Oracle 9i.
e.g.: always_semi_join = nested_loops

15) STAR_TRANSFORMATION_ENABLED
This specifies whether query transformation will be applied to star queries. It can be set to TRUE, FALSE or TEMP_DISABLE (transformation will take place but will not use temporary tables). I presently set it to FALSE due to some known issues of sub-optimal queries being generated. If you intend to use this, please upgrade your version to 8.1.7.4 and above.
e.g.: star_transformation_enabled = false

16) PARALLEL_BROADCAST_ENABLED
This parameter refers to parallel executions in cluster databases. It is meant for improving hash and sort-merge join operations where a very large result set is joined with a very small result set. When this option is enabled, the optimizer broadcasts a copy of all rows in the smaller result set to all cluster databases that are processing some rows of the larger result set. 
It is obsolete in release 9.2.0.
e.g.: parallel_broadcast_enabled = false

17) OPTIMIZER_DYNAMIC_SAMPLING
This parameter is introduced in release 9i. It is meant for situations where tables are not analyzed. As CBO depends heavily on statistics, the parameter tells the optimizer to sample the unanalyzed tables that are being used in a query. A level of 0 to 10 can be specified, the higher the value the more time optimizer spends in sampling.
e.g.: optimizer_dynamic_sampling = 1

18) PARTITION_VIEW_ENABLED
This parameter is meant for backward compatibility to support partition views. Oracle recommends use of partition tables rather than partition views. If you are migrating to CBO, chances are that you may not be using partition views.
e.g.: partition_view_enabled = false

19) CURSOR_SHARING
This parameter determines what kind of SQL statements can share the same cursor. It can be set to FORCE, SIMILAR or EXACT. FORCE will try to squeeze statements that may differ in some literals to share the same cursor. SIMILAR is somewhat the same but will try to maintain the plan optimization for identical statements. EXACT allows statements with exact identical text to share a cursor.
Using FORCE may sometimes result in unexpected results.
e.g.: cursor_sharing = exact

20) PGA_AGGREGATE_TARGET
Introduced in Oracle 9i, this parameter specifies the aggregate PGA memory available to all server processes attached to an instance. This parameter can be set for automatic sizing of SQL working areas. It replaces other existing parameters like SORT_AREA_SIZE, BITMAP_MERGE_AREA_SIZE and HASH_AREA_SIZE.
It can be set to a value between 10 MB to 4000 GB, depending on the setup requirement.
1- Make a first estimate for PGA_AGGREGATE_TARGET  based on the following rule:
    PGA_AGGREGATE_TARGET  = (Total Physical Memory * 0.16)    - For OLTP systems  (16% of Server Physical Mem)
    PGA_AGGREGATE_TARGET  = (Total Physical Memory * 0 .4  )   - For DSS systems   (40% of Server Physical Mem)
More Information HERE



Gatering Statistics for Tables
Oracle strongly recommends that you use the DBMS_STATS package rather than ANALYZE to collect optimizer statistics. That package lets you collect statistics in parallel, collect global statistics for partitioned objects, and fine tune your statistics collection in other ways.
However, you must use the ANALYZE statement rather than DBMS_STATS for statistics collection not related to the cost-based optimizer, such as:
    * To use the VALIDATE or LIST CHAINED ROWS clauses
    * To collect information on freelist blocks

In order to use the DBMS_STATS Package, the user needs to get the execute privilege. Run the following as SYS:
    grant execute on DBMS_STATS to user_xyz;

Check table statistics date:
SQL> select table_name, last_analyzed from user_tables where table_name='T1';

SQL> Begin DBMS_STATS.GATHER_TABLE_STATS (
ownname => 'SCOTT',
tabname => 'EMP',
degree => 2,
cascade => TRUE,
METHOD_OPT => 'FOR COLUMNS SIZE AUTO',
estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE);
     END;
     /

DEGREE => 2 :Degree of parallelism.
CASCADE => TRUE
:Gather statistics on the indexes as well. If not used Oracle will determine whether to collect it or not.
options =>'GATHER' :Gathers statistics on all objects in the schema.
                    =>'GATHER AUTO' :Oracle determines which objects need new statistics, and determines how to gather those statistics.
                    =>'GATHER STALE':Gathers statistics on stale objects. will return a list of stale objects.
                    =>'GATHER EMPTY':Gathers statistics on objects have no statistics.will return a list of no stats objects.
                    =>'LIST AUTO' : Returns a list of objects to be processed with GATHER AUTO.
                    =>'LIST STALE': Returns a list of stale objects as determined by looking at the *_tab_modifications views.
                    =>'LIST EMPTY': Returns a list of objects which currently have no statistics.
METHOD_OPT=>  : For gathering Histograms:
     FOR COLUMNS SIZE AUTO : you can specify one column between "" instead of all columns.
     FOR ALL COLUMNS SIZE REPEAT : Prevent deletion of histograms and collect it only for columns already have histograms.
     FOR ALL COLUMNS  : collect histograms on all columns.
     FOR ALL COLUMNS SIZE SKEWONLY : collect histo for columns have skewed value should test skewness first>.
     FOR ALL INDEXED COLUMNS : collect histograms for columns have indexses only.
ESTIMATE_PERCENT => DBMS_STATS.AUTO_SAMPLE_SIZE :(DEFAULT) Auto set the sample size % for skew(distinct) values (accurate and faster than setting a manual sample size).

Some Examples

Collect statistics for a table and its indexes:
EXEC DBMS_STATS.gather_table_stats(USER,  'LOOKUP', cascade => TRUE);

Collect stats for a table, estimating 20%, with parallelism
execute dbms_stats.gather_table_stats(ownname => 'SCOTT', tabname => 'DEPT', estimate_percent => 10, degree => 4 ,cascade => true );

Estimate statistics for columns in a table, this will also generate statistics for tables.
exec dbms_stats.gather_table_stats(ownname => 'SCOTT', tabname => 'DEPT', estimate_percent => 5, method_opt => 'FOR ALL COLUMNS');

SQL Source - Dynamic Method for TABLES
DECLARE
sql_stmt    VARCHAR2(1024);
BEGIN
  FOR tab_rec IN (SELECT owner,table_name
                    FROM all_tables WHERE owner like UPPER('&1') ) LOOP
        sql_stmt := 'BEGIN dbms_stats.gather_table_stats(ownname => :1, tabname  => :2, partname => null, estimate_percent => 10, degree => 3 ,cascade => true);  END;'  ;
      EXECUTE IMMEDIATE sql_stmt USING tab_rec.owner, tab_rec.table_name ;
  END LOOP;
END;
/

Collecting Stats in a Procedure
CREATE OR REPLACE PROCEDURE analyze_any_schema ( p_inOwner IN all_users.username%TYPE)
IS
BEGIN
    FOR v_tabs  IN  (SELECT owner, table_name
                        FROM all_tables
                        WHERE owner       =   p_inOwner
                          AND temporary   <>  'Y')
    LOOP
        DBMS_OUTPUT.put_line ('EXEC  DBMS_STATS.gather_table_stats('''||v_tabs.owner||
                               ''','''||v_tabs.table_name||''',NULL,1);' );
        BEGIN
           DBMS_STATS.gather_table_stats(v_tabs.owner,v_tabs.table_name,NULL,1);
           DBMS_OUTPUT.put_line ('Analyzed '||v_tabs.owner||'.'||table_name||'... ');
        EXCEPTION
            WHEN OTHERS THEN
                DBMS_OUTPUT.put_line ('Exception on analysis of '||v_tabs.table_name||'!');
                DBMS_OUTPUT.put_line (SUBSTR(SQLERRM,1,255));
        END;
    END LOOP;
END;
/
CREATE OR REPLACE Procedure DB_Maintenance_Weekly  is
  sql_stmt    varchar2(1024);
  v_sess_user varchar2(30);
BEGIN
    select sys_context('USERENV','SESSION_USER') into v_sess_user
      from dual ;
    --Analyze all Tables
    FOR tab_rec IN (SELECT table_name
                       FROM all_tables
                       WHERE owner = v_sess_user
                         and table_name not like 'TEMP_%') LOOP
        sql_stmt := 'BEGIN dbms_stats.gather_table_stats
                     (ownname  => :1,
                      tabname    => :2,
                      partname   => null,
                      estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE,
                      degree => 3 ,
                      cascade => true);  END;'  ;
      EXECUTE IMMEDIATE sql_stmt USING v_sess_user, tab_rec.table_name ;
   END LOOP;
  EXCEPTION
    WHEN NO_DATA_FOUND THEN
        NULL ;
end;
/

Force Statistics to a Table
You can use the following sentence to force statistics to a Table:
    exec dbms_stats.set_table_stats( user, 'EMP', numrows => 1000000, numblks => 300000 );

No statistics are collected for Global Temporary Tables; handle these with care. Make use of hints to drive queries on these. Provide explicit statistics by using DBMS_STATS.SET_TABLE_STATS if your temporary table is used for processing huge amounts of data.

How often to collect stats?
When there is a 10-20% change in data, the general convention is to generate fresh statistics. You can start of with a general rule of estimating statistics on a weekly basis. If the tables are giving real bad hits because of heavy activity, you may consider using the DML Monitoring option to update statistics every few hours for such tables. Statistics are not incremental and are regenerated every time. If there is no considerable change in data, there is no advantage in generating statistics too frequently.

DML Monitoring
Used by dbms_stats to identify objects with "stale" statistics
- On by default in 10g
    alter table <table_name> monitoring;
- For the Full Database:
    dbms_stats.alter_database_tab_monitoring(monitoring=>true, sysobjs=>false);

- Tracked in [DBA|ALL|USER]_TAB_MODIFICATIONS
- 9i and 10g use 10% change as the threshold to gather stats
In Oracle 10g, Oracle automatically gathers index statistics whenever the index is created or rebuilt.


Gathering statistics for a Schema, a Database and System Stats



Gather SCHEMA Stats:
=====================
Collect statistics for my Schema (All the tables and Indexes), estimating 10%
SQL> Exec DBMS_STATS.GATHER_SCHEMA_STATS (

       ownname =>'SCOTT',
       estimate_percent=>10,
       degree=>1,
       cascade=>TRUE,
       options=>'GATHER STALE');


Gather Database Stats:
====================
SQL> EXEC DBMS_STATS.GATHER_DATABASE_STATS(
ESTIMATE_PERCENT=>100,
METHOD_OPT=>'FOR ALL
COLUMNS SIZE SKEWONLY',
CASCADE => TRUE,
degree => 4,
OPTIONS => 'GATHER STALE',
GATHER_SYS => TRUE
STATTAB => PROD_STATS);

ESTIMATE_PERCENT => Det the sample size % for skew(distinct) values (accurate and faster than setting a manual sample size). A good option to use: DBMS_STATS.AUTO_SAMPLE_SIZE
METHOD_OPT=>  : For gathering Histograms:
     FOR COLUMNS SIZE AUTO : you can specify one column between "" instead of all columns.
     FOR ALL COLUMNS SIZE REPEAT : Prevent deletion of histograms and collect it only for columns already have histograms.
     FOR ALL COLUMNS  : collect histograms on all columns.
     FOR ALL COLUMNS SIZE SKEWONLY : collect histo for columns have skewed value should test skewness first>.
     FOR ALL INDEXED COLUMNS : collect histograms for columns have indexses only.
CASCADE => TRUE
:Gather statistics on the indexes as well. If not used Oracle will determine whether to collected or not.
DEGREE => 4 :Degree of parallelism.
OPTIONS =>'GATHER' :Gathers statistics on all objects in the schema.
                    =>'GATHER AUTO' :Oracle determines which objects need new statistics, and determines how to gather those statistics.
                    =>'GATHER STALE':Gathers statistics on stale objects. will return a list of stale objects.
                    =>'GATHER EMPTY':Gathers statistics on objects have no statistics.will return a list of no stats objects.
                    =>'LIST AUTO' : Returns a list of objects to be processed with GATHER AUTO.
                    =>'LIST STALE': Returns a list of stale objects as determined by looking at the *_tab_modifications views.
                    =>'LIST EMPTY': Returns a list of objects which currently have no statistics.
GATHER_SYS => TRUE :Gathers statistics on the objects owned by the 'SYS' user.
STATTAB => PROD_STATS :Table will save the current statistics. see SAVE & IMPORT STATISTICS section -last third in this post-.


Note: All parameters above are valid for all stats kind (schema,table,..) except Gather_SYS.

For faster execution:
------------------
SQL> EXEC DBMS_STATS.GATHER_DATABASE_STATS(ESTIMATE_PERCENT=>DBMS_STATS.AUTO_SAMPLE_SIZE,degree => 8);

What's new?
ESTIMATE_PERCENT=>DBMS_STATS.AUTO_SAMPLE_SIZE => Let Oracle estimate skewed values always gives excellent results.(DEFAULT).
Removed "METHOD_OPT=>'FOR ALL COLUMNS SIZE SKEWONLY'" => As histograms is not recommended to be gathered on all columns.
Removed "cascade => TRUE" To let Oracle determine whether index statistics to be collected or not.
Doubled "degree => 8" but this depends on the number of CPUs on the machine and accepted CPU overhead during gathering DB statistics.

This task became an automated task starting from 10g, To check the status of that task:
SQL> select * from dba_autotask_client where client_name = "auto optimizer stats collection" ;



###################
Fixed OBJECTS Statistics
###################

What are Fixed objects:
-Fixed objects are the x$ tables (been loaded in SGA during startup) on which V$ views are built (V$SQL etc.).
-If the statistics are not gathered on fixed objects, the Optimizer will use predefined default values for the statistics. These defaults may lead to inaccurate execution plans.
-Statistics on fixed objects are not being gathered automatically nor within gathering DB stats.

How frequent to gather stats on fixed objects?
Only one time for a representative workload unless you've one of these cases:

-After a major database or application upgrade.
-After implementing a new module.
-After changing the database configuration. e.g. changing the size of memory pools (sga,pga,..).
-Poor performance/Hang encountered while querying dynamic views e.g. V$ views.

Note:
-It's recommended to Gather the fixed object stats during peak hours (system is busy) or after the peak hours but the sessions are still connected (even if they idle), to guarantee that the fixed object tables been populated and the statistics well represent the DB activity.
-also note that performance degradation may be experienced while the statistics are gathering.
-Having no statistics is better than having a non representative statistics.

How to gather stats on fixed objects:
1- Firstly Check the last analyzed date:
select OWNER, TABLE_NAME, LAST_ANALYZED from dba_tab_statistics where table_name='X$KGLDP';

2- Secondly Export the current fixed stats in a table: (in case you need to revert back)
EXEC DBMS_STATS.CREATE_STAT_TABLE('OWNER','STATS_TABLE_NAME','TABLESPACE_NAME');
EXEC dbms_stats.export_fixed_objects_stats(stattab=>'STATS_TABLE_NAME',statown=>'OWNER');

3- Thirdly Gather the fixed objects stats:
exec dbms_stats.gather_fixed_objects_stats;

Note In case of reverting back to the old stats:
In case you experienced a bad performance on fixed tables after gathering the new statistics:

exec dbms_stats.delete_fixed_objects_stats(); 
exec DBMS_STATS.import_fixed_objects_stats(stattab =>'STATS_TABLE_NAME',STATOWN =>'OWNER');


#################
SYSTEM STATISTICS
#################

What is system statistics:
System statistics are statistics about CPU speed and IO performance, it enables the CBO to effectively cost each operation in an execution plan. Introduced in Oracle 9i.

Why gathering system statistics:
------------------------------
Oracle highly recommends gathering system statistics during a representative workload, ideally at peak workload time, in order to provide more accurate CPU/IO cost estimates to the optimizer.
You only have to gather system statistics once.
There are two types of system statistics (NOWORKLOAD statistics & WORKLOAD statistics):

NOWORKLOAD statistics:
This will simulates a workload -not the real one but a simulation- and will not collect full statistics, it's less accurate than "WORKLOAD statistics" but if you can't capture the statistics during a typical workload you can use noworkload statistics.

To gather noworkload statistics:

SQL> execute dbms_stats.gather_system_stats();

WORKLOAD statistics:
This will gather statistics during the current workload -which supposed to be representative of actual system I/O and CPU workload on the DB-.

To gather WORKLOAD statistics:
SQL> execute dbms_stats.gather_system_stats('start');

Once the workload window ends after 1,2,3.. hours or whatever, stop the system statistics gathering:
SQL> execute dbms_stats.gather_system_stats('stop');

You can use time interval (minutes) instead of issuing start/stop command manually:
SQL> execute dbms_stats.gather_system_stats('interval',60);

Check the system values collected:
col pname format a20
col pval2 format a40
select * from sys.aux_stats$;

cpuspeedNW: Shows the noworkload CPU speed, in terms of the average number of CPU cycles per second.
ioseektim:  The sum of seek time, latency time, and OS overhead time.
iotfrspeed: I/O transfer speed,tells the optimizer how fast the database can read data in a single read request.
cpuspeed:   Stands for CPU speed during a workload statistics collection.
maxthr:        The maximum I/O throughput.
slavethr:   Average parallel slave I/O throughput.
sreadtim:   The Single Block Read Time statistic shows the average time for a random single block read.
mreadtim:   The average time (seconds) for a sequential multiblock read.
mbrc:       The average multiblock read count in blocks.

According to Oracle, collecting workload statistics doesn't impose an additional overhead on your system.

Delete system statistics:
SQL> execute dbms_stats.delete_system_stats();





####################
Data Dictionary Statistics
####################

Facts:
>Dictionary tables are the tables owned by SYS and residing in the system tablespace.
>Normally data dictionary statistics in 9i is not required unless performance issues are detected.
>In 10g Statistics on the dictionary tables will be maintained via the automatic statistics gathering job run during the nightly maintenance window.

If you choose to switch off that job for application schema consider leaving it on for the dictionary tables. You can do this by changing the value of AUTOSTATS_TARGET from AUTO to ORACLE using the procedure:

SQL> Exec DBMS_STATS.SET_PARAM(AUTOSTATS_TARGET,'ORACLE'); 

When to gather Dictionary statistics:
-After DB upgrades.
-After creation of a new big schema.
-Before and after big datapump operations.

Check last Dictionary statistics date:
SQL> select table_name, last_analyzed from dba_tables where owner='SYS' and table_name like '%$' order by 2;

Gather Dictionary Statistics:  
SQL> EXEC DBMS_STATS.GATHER_DICTIONARY_STATS; ->Will gather stats on 20% of SYS schema tables.
or...
SQL> EXEC DBMS_STATS.GATHER_SCHEMA_STATS ('SYS'); ->Will gather stats on 100% of SYS schema tables.
or...
SQL> EXEC DBMS_STATS.GATHER_DATABASE_STATS (gather_sys=>TRUE); ->Will gather stats on the whole DB+SYS schema.




#############
Extended Statistics "11g onwards"
#############

Extended statistics can be gathered on columns based on functions or column groups.

Gather extended stats on column function:
===================================
If you run a query having in the WHERE statement a function like upper/lower the optimizer will be off and index on that column will not be used:

SQL> select count(*) from EMP where lower(ename) = 'scott';

In order to make optimizer work with function based terms you need to gather extended stats:

1-Create extended stats:
>>>>>>>>>>>>>>>>>>>
SQL> select dbms_stats.create_extended_stats('SCOTT','EMP','(lower(ENAME))') from dual;

2-Gather histograms:
>>>>>>>>>>>>>>>
SQL> exec dbms_stats.gather_table_stats ('SCOTT','EMP', method_opt=> 'for all columns size skewonly');

OR
--
*You can do it also in one Step:
>>>>>>>>>>>>>>>>>>>>>>>>>

SQL> Begin
     dbms_stats.gather_table_stats (
     ownname    => 'SCOTT',
     tabname    => 'EMP',
     method_opt => 'for all columns size skewonly for columns (lower(ENAME))'
     );
     end;

To check the Existance of extended statistics on a table:
---------------------------------------------------
SQL> select extension_name,extension from dba_stat_extensions where owner='SCOTT'and table_name = 'EMP';

SYS_STU2JLSDWQAFJHQST7$QK81_YB (LOWER("ENAME"))

Drop extended stats on column function:
-------------------------------------
SQL> exec dbms_stats.drop_extended_stats('SCOTT','EMP','(LOWER("ENAME"))');

Gather extended stats on column group: -related columns-
=================================
Certain columns in a table that are part of a join condition (where statement  are correlated e.g.(country,state). You want to make the optimizer aware of this relationship between two columns and more instead of using separate statistics for each columns. By creating extended statistics on a group of columns, the Optimizer can determine a more accurate the relation between the columns are used together in a where clause of a SQL statement. e.g. columns like country_id and state_name the have a relationship, state like Texas can only be found in USA so the value of state_name are always influenced by country_id.
If there are extra columns are referenced in the "WHERE statement  with the column group the optimizer will make use of column group statistics.

1- create a column group:
>>>>>>>>>>>>>>>>>>>>>
SQL> select dbms_stats.create_extended_stats('SH','CUSTOMERS', '(country_id,cust_state_province)') from dual;

2- Re-gather stats|histograms for table so optimizer can use the newly generated extended statistics:
>>>>>>>>>>>>>>>>>>>>>>>
SQL> exec dbms_stats.gather_table_stats ('SH','customers', method_opt=> 'for all columns size skewonly');

OR
--
*You can do it also in one Step:
>>>>>>>>>>>>>>>>>>>>>>>>>

SQL> Begin
     dbms_stats.gather_table_stats (
     ownname    => 'SH',
     tabname    => 'CUSTOMERS',
     method_opt => 'for all columns size skewonly for columns (country_id,cust_state_province)'
     );
     end;

Drop extended stats on column group:
--------------------------------------
SQL> exec dbms_stats.drop_extended_stats('SH','CUSTOMERS', '(country_id,cust_state_province)');



########
Histograms
########

What are Histograms?
------------------
>Holds data about values within a column in a table for number of occurrences for a specific value/range.
>Used by CBO to optimize a query to use whatever index Fast Full scan or table full scan.
>Usually being used against columns have data being repeated frequently like country or city column.
>gathering histograms on a column having distinct values (PK) is useless because values are not repeated.
>Two types of Histograms can be gathered:
 >Frequency histograms: is when distinct values (buckets) in the column is less than 255 (# countries is always less than 254.

Height balanced histograms: are similar to frequency histograms in their design, but distinct values  > 254

See Example: http://aseriesoftubes.com/articles/beauty-and-it/quick-guide-to-oracle-histograms/

- Collected by DBMS_STATS (which by default doesn't collect histograms, it deletes them if you didn't use the parameter).
- Mainly being gathered on foreign key columns/columns in WHERE statement.
- Helps in SQL multi-table joins.
- Column histograms like statistics are being stored in data dictionary.
- If application exclusively uses bind variables, Oracle recommends deleting any existing Oracle histograms and disabling Oracle histograms generation.


Caution:
– Do not create them on Columns that are not queried.
– Do not create them on every column of every table.
– Do not create them on PK of a table.

Verify the existence of histograms:
----------------------------------
SQL> select column_name,histogram from dba_tab_col_statistics where owner='SCOTT' and table_name='EMP';

Creating Histograms:
----------------------
e.g.

SQL> Exec dbms_stats.gather_table_stats(
ownname => '', 
tabname => '', 
        estimate_percent => dbms_stats.auto_sample_size, 
METHOD_OPT => 'FOR COLUMNS SIZE AUTO ');

SQL> Exec dbms_stats.gather_schema_stats(
      ownname          => 'SCOTT', 
      estimate_percent => dbms_stats.auto_sample_size, 
      method_opt       => 'for all columns size auto', 
      degree           => 7);

FOR COLUMNS SIZE AUTO  => Fastest. you can specify one column instead of all columns.
FOR ALL COLUMNS SIZE REPEAT => to prevent deletion of histograms and collect it only for columns already have histograms.
FOR ALL COLUMNS              => collect histograms on all columns .
FOR ALL COLUMNS SIZE SKEWONLY => collect histograms for columns have skewed value .
FOR ALL INDEXES COLUMNS       => collect histograms for columns have indexes.

Note: For AUTO & SKEWONLY Oracle will decide whatever to create Histograms or not.

Check existence of Histograms:

SQL> select column_name, count(*) from dba_tab_histograms where table_name='SMFILECABINET' group by column_name;

Drop Histograms: 11g
-----------------
e.g.
SQL> Exec dbms_stats.delete_column_stats(ownname=>'SH', tabname=>'SALES', colname=>'PROD_ID', col_stat_type=> HISTOGRAM);

Stop gathering Histograms: 11g
e.g.
SQL> Exec dbms_stats.set_table_prefs
     ('SH', 'SALES','METHOD_OPT', 'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 1 PROD_ID');
>Will continue to collect histograms as usual on all columns in the SALES table except for the PROD_ID column.

Drop Histograms: 10g
----------------
e.g.
SQL> exec dbms_stats.delete_column_stats( user, 'T', 'USERNAME' );



##################################
Save/IMPORT & RESTORE STATISTICS:
##################################
===================
Export /Import Statistics:
===================
In this way statistics will be exported into table then imported later from that table.

1-Create STATS TABLE:
-  ----------------------
SQL> Exec dbms_stats.create_stat_table (ownname => 'SYSTEM', stattab => 'prod_stats', tblspace => 'USERS'); 

2-Export the statistics to the STATS table:
------------------------------------------
For Database stats:
SQL> Exec dbms_stats.export_database_stats (statown => 'SYSTEM', stattab => 'prod_stats');

For System stats:
SQL> Exec dbms_stats.export_SYSTEM_stats (statown => 'SYSTEM', stattab => 'prod_stats');

For Dictionary stats:
SQL> Exec dbms_stats.export_Dictionary_stats (statown => 'SYSTEM', stattab => 'prod_stats');

For Fixed Tables stats:
SQL> Exec dbms_stats.export_FIXED_OBJECTS_stats (statown => 'SYSTEM', stattab => 'prod_stats');

For Schema stas:
SQL> EXEC DBMS_STATS.EXPORT_SCHEMA_STATS('ORIGINAL_SCHEMA' ,'STATS_TABLE',NULL,'STATS_TABLE_OWNER');

For Table:
SQL> Conn scott/tiger
SQL> Exec dbms_stats.export_TABLE_stats (ownname => 'SCOTT',tabname => 'EMP',stattab => 'prod_stats');

For Index:
SQL> Exec dbms_stats.export_INDEX_stats (ownname => 'SCOTT',indname => 'PK_EMP',stattab => 'prod_stats');

For Column:
SQL> Exec dbms_stats.export_COLUMN_stats (ownname =>'SCOTT',tabname=>'EMP',colname=>'EMPNO',stattab=>'prod_stats');

3-Import the statistics from PROD_STATS table to the dictionary:
--------------------------------------------------------------------
For Database stats:
SQL> Exec DBMS_STATS.IMPORT_DATABASE_STATS (stattab => 'prod_stats',statown => 'SYSTEM');

For System stats:
SQL> Exec DBMS_STATS.IMPORT_SYSTEM_STATS (stattab => 'prod_stats',statown => 'SYSTEM');

For Dictionary stats:
SQL> Exec DBMS_STATS.IMPORT_Dictionary_STATS (stattab => 'prod_stats',statown => 'SYSTEM');

For Fixed Tables stats:
SQL> Exec DBMS_STATS.IMPORT_FIXED_OBJECTS_STATS (stattab => 'prod_stats',statown => 'SYSTEM');

For Schema stats:
SQL> Exec DBMS_STATS.IMPORT_SCHEMA_STATS (ownname => 'SCOTT',stattab => 'prod_stats',statown => 'SYSTEM');

For Table stats and it's indexes:
SQL> Exec dbms_stats.import_TABLE_stats ( ownname => 'SCOTT', stattab => 'prod_stats', tabname => 'EMP');

For Index:
SQL> Exec dbms_stats.import_INDEX_stats ( ownname => 'SCOTT', stattab => 'prod_stats', indname => 'PK_EMP');

For COLUMN:
SQL> Exec dbms_stats.import_COLUMN_stats (ownname =>'SCOTT',tabname=>'EMP',colname=>'EMPNO',stattab=>'prod_stats');

4-Drop Stat Table:
-------------------
SQL> Exec dbms_stats.DROP_STAT_TABLE (stattab => 'prod_stats',ownname => 'SYSTEM');


==============
Restore statistics: -From Dictionary-
==============
Old statistics are saved automatically in SYSAUX for 31 day.

Restore Dictionary stats as of timestamp:
-----------------------------------------
SQL> Exec DBMS_STATS.RESTORE_DICTIONARY_STATS(sysdate-1);

Restore Database stats as of timestamp:
--------------------------------------
SQL> Exec DBMS_STATS.RESTORE_DATABASE_STATS(sysdate-1);

Restore SYSTEM stats as of timestamp:
--------------------------------------
SQL> Exec DBMS_STATS.RESTORE_SYSTEM_STATS(sysdate-1);

Restore FIXED OBJECTS stats as of timestamp:
-----------------------------------------------
SQL> Exec DBMS_STATS.RESTORE_FIXED_OBJECTS_STATS(sysdate-1);

Restore SCHEMA stats as of timestamp:
---------------------------------------
SQL> Exec dbms_stats.restore_SCHEMA_stats(ownname=>'SYSADM',AS_OF_TIMESTAMP=>sysdate-1);
OR:
SQL> Exec dbms_stats.restore_schema_stats(ownname=>'SYSADM',AS_OF_TIMESTAMP=>'20-JUL-2008 11:15:00AM');

Restore Table stats as of timestamp:
-----------------------------------
SQL> Exec DBMS_STATS.RESTORE_TABLE_STATS(ownname=>'SYSADM', tabname=>'T01POHEAD',AS_OF_TIMESTAMP=>sysdate-1);

=====
FACTS:
=====

To Check current Stats history retention period (days):
---------------------------------------------------
SQL> select dbms_stats.get_stats_history_retention from dual;
SQL> select dbms_stats.get_stats_history_availability from dual;

To modify current Stats history retention period (days):
---------------------------------------------------
SQL> Exec dbms_stats.alter_stats_history_retention(60);

Purge statistics older than 10 days:
-------------------------------
SQL> Exec DBMS_STATS.PURGE_STATS(SYSDATE-10);


Procedure To claim space after purging statstics:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Space will not be claimed automatically when you purge stats, you must claim it manually using this procedure:

Check Stats tables size:
>>>>>>
col Mb form 9,999,999
col SEGMENT_NAME form a40
col SEGMENT_TYPE form a6
set lines 120
select sum(bytes/1024/1024) Mb, segment_name,segment_type from dba_segments
where  tablespace_name = 'SYSAUX'
and segment_name like 'WRI$_OPTSTAT%'
and segment_type='TABLE'
group by segment_name,segment_type order by 1 asc
/

Check Stats indexes size:
>>>>>>
col Mb form 9,999,999
col SEGMENT_NAME form a40
col SEGMENT_TYPE form a6
set lines 120
select sum(bytes/1024/1024) Mb, segment_name,segment_type from dba_segments
where  tablespace_name = 'SYSAUX'
and segment_name like '%OPT%'
and segment_type='INDEX'
group by segment_name,segment_type order by 1 asc
/

Move Stats tables in same tablespace:
>>>>>>
select 'alter table '||segment_name||'  move tablespace SYSAUX;' from dba_segments where tablespace_name = 'SYSAUX'
and segment_name like '%OPT%' and segment_type='TABLE'
/

Rebuild Stats indexes:
>>>>>>
select 'alter index '||segment_name||'  rebuild online;' from dba_segments where tablespace_name = 'SYSAUX'
and segment_name like '%OPT%' and segment_type='INDEX'
/

Check for un-usable indexes:
>>>>>>
select  di.index_name,di.index_type,di.status  from  dba_indexes di , dba_tables dt
where  di.tablespace_name = 'SYSAUX'
and dt.table_name = di.table_name
and di.table_name like '%OPT%'
order by 1 asc
/


Delete Statistics:
=============
For Database stats:
SQL> Exec DBMS_STATS.DELETE_DATABASE_STATS ();

For System stats:
SQL> Exec DBMS_STATS.DELETE_SYSTEM_STATS ();

For Dictionary stats:
SQL> Exec DBMS_STATS.DELETE_DICTIONARY_STATS ();

For Fixed Tables stats:
SQL> Exec DBMS_STATS.DELETE_FIXED_OBJECTS_STATS ();

For Schema stats:
SQL> Exec DBMS_STATS.DELETE_SCHEMA_STATS ('SCOTT');

For Table stats and it's indexes:
SQL> Exec dbms_stats.DELETE_TABLE_stats (ownname=>'SCOTT',tabname=>'EMP');

For Index:
SQL> Exec dbms_stats.DELETE_INDEX_stats ( ownname => 'SCOTT',indname => 'PK_EMP');

For COLUMN:
SQL> Exec dbms_stats.DELETE_COLUMN_stats (ownname =>'SCOTT',tabname=>'EMP',colname=>'EMPNO');


Note: This procedure can be rollback by restoring STATS using DBMS_STATS.RESTORE_ procedure.


Pending Statistics:  "11g onwards"
=============
Switch on pending statistics:
SQL> Exec DBMS_STATS.SET_GLOBAL_PREFS('PENDING','TRUE');

Gather 11g statistics:
SQL> Exec DBMS_STATS.GATHER_TABLE_STATS('sh','SALES');

Test your critical SQL statement with the pending stats:
SQL> Alter session set optimizer_use_pending_statistics=TRUE;

When proven, publish the pending statistics:
SQL> Exec DBMS_STATS.PUBLISH_PENDING_STATS();














Collect statistics for the Database:
execute dbms_stats.gather_database_stats(estimate_percent => 20, degree => 5, cascade => true);

Generating System Statistics
Gather statistics during the day. Gathering ends after 720 minutes and is stored in the mystats table:
exec DBMS_STATS.GATHER_SYSTEM_STATS(gathering_mode => 'interval', interval => 720, stattab => 'mystats', statid => 'OLTP');

Gather statistics during the night. Gathering ends after 720 minutes and is stored in the mystats table:
DBMS_STATS.GATHER_SYSTEM_STATS(gathering_mode => 'interval', interval => 720, stattab => 'mystats', statid => 'OLAP');

If appropriate, you can switch between the statistics gathered. It is possible to automate this process by submitting a job to update the dictionary with appropriate statistics. During the day, the following jobs import the OLTP statistics for the daytime run:
VARIABLE jobno number;
BEGIN
   DBMS_JOB.SUBMIT(:jobno, 'DBMS_STATS.IMPORT_SYSTEM_STATS(''mystats'',''OLTP'');', SYSDATE, 'SYSDATE + 1');
   COMMIT;
END;
/

During the night, the following jobs import the OLAP statistics for the night time run:
BEGIN
   DBMS_JOB.SUBMIT(:jobno, 'DBMS_STATS.IMPORT_SYSTEM_STATS(''mystats'',''OLAP'');', SYSDATE + 0.5, 'SYSDATE + 1');
   COMMIT;
END;
/

STATISTICS FOR THE DATA DICTIONARY
New in Oracle Database 10g is the ability to gather statistics for the data dictionary.  The objective is to enhance the performance of queries.  There are two basic types of dictionary base tables.
The statistics for normal base tables are gathered using GATHER_DICTIONARY STATISTICS.  They may also be gathered using GATHER_SCHEMA_STATS for the SYS schema.  Oracle recommends gathering these statistics at a similar frequency as your other database objects.
Statistics for fixed objects (the V$ views on the X$ tables) are gathered using the GATHER_FIXED_OBJECT_STATS procedure.  The initial collection of these statistics is normally sufficient.  Repeat only if workload characteristics have changed dramatically.  The SYSDBA privilege or ANALYZE ANY DICTIONARY and ANALYZE ANY privileges are required to execute the procedures for gathering data dictionary statistics.
 

The Option parameter
There are several values for the options parameter that we need to know about:
 -    gather - re-analyzes the whole schema.
-     gather empty - Only analyze tables that have no existing statistics.
-     gather stale - Only re-analyze tables with more than 10% modifications (inserts, updates, deletes). The table should be in monitor status first.
-     gather auto - This will re-analyze objects which currently have no statistics and objects with stale statistics.The table should be in monitor status first.
                            Using gather auto is like combining gather stale and gather empty.
Note that both gather stale and gather auto require monitoring. If you issue the "alter table xxx monitoring" command, Oracle tracks changed tables with the dba_tab_modifications view.
The most interesting of these options is the gather stale option. Because all statistics will become stale quickly in a robust OLTP database, we must remember the rule for gather stale is > 10% row change (based on num_rows at statistics collection time).
Hence, almost every table except read-only tables will be re-analyzed with the gather stale option. Hence, the gather stale option is best for systems that are largely read-only. For example, if only 5% of the database tables get significant updates, then only 5% of the tables will be re-analyzed with the "gather stale" option.
The CASCADE => TRUE option causes all indexes for the tables to also be analyzed.  In Oracle 10g, set CASCADE to AUTO_CASCADE to let Oracle decide whether or not new index statistics are needed.


Important Notes



Verifying Table Statistics
Syntax:
dbms_stats.get_table_stats(ownname, tabname, partname, stattab, statid, numrows, numblks, avgrlen, statown);

declare
    l_numrows  number;
    l_numblks  number;
    l_avgrlen  number;
begin
    dbms_stats.get_table_stats(ownname => 'SYS', tabname => 'AM01',  numrows => l_numrows, numblks => l_numblks, avgrlen => l_avgrlen);
    dbms_output.put_line('No. of rows: ' || l_numrows);
    dbms_output.put_line('No. of blks: ' || l_numblks);
    dbms_output.put_line('Avg row length: ' || l_avgrlen);
end;
/
No. of rows: 4106860
No. of blks: 6219
Avg row length: 3


Verifying Index Statistics

SELECT INDEX_NAME "NAME", NUM_ROWS, DISTINCT_KEYS "DISTINCT",
LEAF_BLOCKS, CLUSTERING_FACTOR "CF", BLEVEL "LEVEL",
AVG_LEAF_BLOCKS_PER_KEY "ALFBPKEY"
FROM DBA_INDEXES
WHERE OWNER = 'SH'
ORDER BY INDEX_NAME;

Typical output is:

NAME                       NUM_ROWS DISTINCT LEAF_BLOCKS      CF   LEVEL   ALFBPKEY
-------------------------- -------- -------- ----------- ------- ------- ----------
CUSTOMERS_PK 50000 50000 454 4405 2 1
PRODUCTS_PK 10000 10000 90 1552 1 1
PRODUCTS_PROD_CAT_IX 10000 4 99 4422 1 24
PRODUCTS_PROD_SUBCAT_IX 10000 37 170 6148 2 4
SALES_PROD_BIX 6287 909 1480 6287 1 1
SALES_PROMO_BIX 4727 459 570 4727 1 1

declare
l_numrows number;
l_numlblks number;
l_numdist number;
l_avglblk number;
l_avgdblk number;
l_clstfct number;
l_indlevel number;
begin
dbms_stats.get_index_stats(ownname => 'SYS', indname => 'AM01_N1',
numrows => l_numrows, numlblks => l_numlblks,
numdist => l_numdist, avglblk => l_avglblk,
avgdblk => l_avgdblk, clstfct => l_clstfct,
indlevel => l_indlevel);
dbms_output.put_line('No. of rows: ' || l_numrows);
dbms_output.put_line('No. of blks: ' || l_numlblks);
dbms_output.put_line('No. of distinct values: ' || l_numdist);
dbms_output.put_line('Avg leaf blocks for distinct keys: ' || l_avglblk);
dbms_output.put_line('Avg data blocks pointed to in the table: ' || l_avgdblk);
dbms_output.put_line('Clustering factor: ' || l_clstfct);
dbms_output.put_line('Index height: ' || l_indlevel);
end;
/
No. of rows: 3819711
No. of blks: 11092
No. of distinct values: 1
Avg leaf blocks for distinct keys: 11092
Avg data blocks pointed to in the table: 14616
Clustering factor: 14616
Index height: 2


Verifying Column Statistics
SELECT COLUMN_NAME, NUM_DISTINCT, NUM_NULLS, NUM_BUCKETS, DENSITY
FROM DBA_TAB_COL_STATISTICS
WHERE TABLE_NAME ="PA_EXPENDITURE_ITEMS_ALL"
ORDER BY COLUMN_NAME;

This returns the following data:

COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS NUM_BUCKETS    DENSITY 
------------------------------ ------------ ---------- ----------- ----------
BURDEN_COST 4300 71957 1 .000232558
BURDEN_COST_RATE 675 7376401 1 .001481481
CONVERTED_FLAG 1 16793903 1 1 poor
COST_BURDEN_DISTRIBUTED_FLAG 2 15796 1 .5 poor
COST_DISTRIBUTED_FLAG 2 0 1 .5 poor
COST_IND_COMPILED_SET_ID 87 6153143 1 .011494253
EXPENDITURE_ID 1171831 0 1 8.5337E-07 GOOD
TASK_ID 8648 0 1 .000115634
TRANSFERRED_FROM_EXP_ITEM_ID 1233787 15568891 1 8.1051E-07 GOOD


declare
l_distcnt number;
l_density number;
l_nullcnt number;
l_srec dbms_stats.statrec;
l_avgclen number;
begin
dbms_stats.get_column_stats(ownname => 'SYS', tabname => 'AM01',
colname => 'COL1', distcnt => l_distcnt, density => l_density,
nullcnt => l_nullcnt, srec => l_srec, avgclen => l_avgclen);
dbms_output.put_line('No. of distinct values: ' || l_distcnt);
dbms_output.put_line('Density: ' || l_density);
dbms_output.put_line('Count of nulls: ' || l_nullcnt);
dbms_output.put_line('Avg. column length: ' || l_avgclen);
end;
/
No. of distinct values: 2
Density: .5
Count of nulls: 0
Avg. column length: 3


Exporting and importing statistics with DBMS_STATS

DBMS_STATS also includes routines for gathering statistics and storing them outside the dictionary. This does not influence the optimizer. Most of the procedures in this package as mentioned above have three common parameters - STATID, STATTAB and STATOWN that are related to user processing of statistics.

Advantages of this feature:

1. Estimated statistics at different percentages could be stored and used for testing.

2. Statistics generated on one database could be transferred to another database.


Using Histograms

The cost-based optimizer can use data value histograms to get accurate estimates of the distribution of column data. A histogram partitions the values in the column into bands, so that all column values in a band fall within the same range. Histograms provide improved selectivity estimates in the presence of data skew, resulting in optimal execution plans with nonuniform data distributions.

One of the fundamental tasks of the cost-based optimizer is determining the selectivity of predicates that appear in queries. Selectivity estimates are used to decide when to use an index and the order in which to join tables. Some attribute domains (a table's columns) are not uniformly distributed.

The cost-based optimizer uses height-based histograms on specified attributes to describe the distributions of nonuniform domains. In a height-based histogram, the column values are divided into bands so that each band contains approximately the same number of values. The useful information that the histogram provides, then, is where in the range of values the endpoints fall.

When to Use Histograms

Histograms can affect performance and should be used only when they substantially improve query plans. Histogram statistics data is persistent, so the space required to save the data depends on the sample size. In general, create histograms on columns that are used frequently in WHERE clauses of queries and have a highly skewed data distribution. For uniformly distributed data, the cost-based optimizer can make fairly accurate guesses about the cost of executing a particular statement without the use of histograms.

Histograms, like all other optimizer statistics, are static. They are useful only when they reflect the current data distribution of a given column. (The data in the column can change as long as the distribution remains constant.) If the data distribution of a column changes frequently, you must recompute its histogram frequently.

Histograms are not useful for columns with the following characteristics:

Creating Histograms

You generate histograms by using the DBMS_STATS package. You can generate histograms for columns of a table or partition. For example, to create a 10-bucket histogram on the SAL column of the emp table, issue the following statement:

EXECUTE DBMS_STATS.GATHER_TABLE_STATS ('scott','emp', METHOD_OPT => 'FOR COLUMNS SIZE 10 sal');

The SIZE keyword declares the maximum number of buckets for the histogram. You would create a histogram on the SAL column if there were an unusually high number of employees with the same salary and few employees with other salaries. You can also collect histograms for a single partition of a table.

Oracle Corporation recommends using the DBMS_STATS package to have the database automatically decide which columns need histograms. This is done by specifying SIZE AUTO.

Example Using a Histogram to Improve an Execution Plan

UPDATE so_lines l
SET open_flag=null, s6=10, s6_date=sysdate,
WHERE l.line_type_code in ('REGULAR','DETAIL','RETURN')
  AND
l.open_flag = 'Y' AND NVL(l.shipped_quantity, 0)=0
  OR
NVL(l.shipped_quantity, 0) != 0
  AND
l.shipped_quantity +NVL(l.cancelled_quantity, 0)= l.ordered_quantity))
  AND
l.s6=18;

This query shows the skewed distribution of data values for s6. In this case, there are two distinct non-null values: 10 and 18. The majority of the rows consists of s6 = 10 (1,589,464), while a small number of rows consist of s6 = 18 (13,091).

S6: COUNT(*)
======================
10 1,589,464
18 13,091
NULL 21,889

The selectivity of column s6, where s6 = 18:

S = 13,091 / (13,091 + 1,589,464) = 0.008

If No Histogram is Used: The selectivity of column s6 is assumed to be 50%, uniformly distributed across 10 and 18. This is not selective; therefore, s6 is not an ideal choice for use as an index.

If a Histogram is Used: The data distribution information is stored in the dictionary. This allows the optimizer to use this information and compute the correct selectivity based on the data distribution. In this example, the selectivity, based on the histogram data, is 0.008. This is a relatively high, or good, selectivity, which leads the optimizer to use an index on column s6 in the execution plan.


Tuning by Statistics

As you know, is not good to resolve tuning issues using the old method of "ratios" only. So we will start understanding several views from Oracle that will help you to tune your database based on its use.

Parsing and Read Consistency
The view v$sysstat is the accumulated statistics for the complete system since the instance has started except for what is active now and shown in v$sesstat. The numbers in this view are held in "tens of milliseconds," (sounds like hundredths of seconds to me). So to convert to seconds, we divide the value by 100. Note that TIMED_STATISTICS must be set to true in the init.ora for these views to have timing numbers. So to see overall parsing and read consistency waits, we can use the following:

column parsing heading 'Parsing|(seconds)'
column total_cpu heading 'Total CPU|(seconds)'
column waiting heading 'Read Consistency|Wait (seconds)'
column pct_parsing heading 'Percent|Parsing'

select total_CPU, parse_CPU parsing,
parse_elapsed-parse_CPU waiting,
trunc(100*parse_elapsed/total_CPU,2) pct_parsing
from (select value/100 total_CPU
from v$sysstat where name = 'CPU used by this session'),
(select value/100 parse_CPU
from v$sysstat where name = 'parse time cpu'),
(select value/100 parse_elapsed
from v$sysstat where name = 'parse time elapsed');



Total CPU Parsing Read Consistency Percent
(seconds) (seconds) Wait (seconds) Parsing
---------- ---------- ---------------- ----------
1236263327 21072.43 4277.93 0
We can see from this output that this instance has consumed a lot of CPU seconds, but a relatively small amount of that was used for parsing. Of the parsing, about 20 percent of the time was waiting for read consistency, but since the parsing percent is so small, this is not impacting our system performance.

Finding Free Buffers
If we need to scan buffers to find a free one, then we are wasting CPU resource. This can be checked with the script below:

select inspected,requested,(inspected/requested) ratio 
from (select value inspected from v$sysstat where name = 'free buffer inspected') a,
(select value requested from v$sysstat where name = 'free buffer requested') b;


INSPECTED REQUESTED RATIO
---------- ---------- ----------
5100 12288112 .000415035

We see that, for the vast majority of the time, the first buffer we look at is available (low ratio). So there is no problem with getting free buffers as we need them. Let's look further.

Active Waiting and Latch Contention
Active waiting can be compared to waiting at the betting window with your two dollars in your hand. You are not just passively standing there, but are focused on all the windows to see when a resource becomes available.
We can see these active waits in v$system_event with the following:

select event,time_waited/100 seconds 
from v$system_event
where event not in
('rdbms ipc message'
,'SQL*Net more data to client'
,'SQL*Net message from client'
,'SQL*Net message to client'
,'pmon timer'
,'smon timer'
,'wakeup time manager')
order by seconds;


EVENT SECONDS
---------------------------------------- ----------
buffer deadlock 0
. . .
latch free 3422.98
. . .
control file parallel write 80356.43
log buffer space 106812.85

The previous code shows me that we have waited the most time for log buffer space. I will check that I have the log buffer sized correctly. This is still a small fraction of the CPU seconds used by the instance as we saw above; it was over one billion, so we are way under one percent of the time spent waiting. This also shows us that we have been waiting for latches. So let's see which latch we are not getting when needed with the following code:

Column name format a35
SELECT name, gets, misses, immediate_gets, immediate_misses
FROM v$latch l
WHERE (misses > .1*gets)
or (immediate_misses > .1*(immediate_misses+immediate_gets));



NAME GETS MISSES IMMEDIATE_GETS IMMEDIATE_MISSES
--------------------------------- ------ -------- -------------- ----------------
channel operations parent latch 86845 6 43300 78881

This is a latch that we can't tune and we also see that the number of misses is not increasing as we rerun this query, so we will leave it alone.
Given this, we have to conclude that the instance is not using too much CPU resource for its own maintenance — so let's check the users.


User Code Execution

In our instance, we have checked that the environment for the instance is not forcing us to consume more CPU resources. So we will check the code that is being executed. Most often, the CPU consumption problem is caused by what a user is doing.

Highest Consuming Sessions
We can see which user is consuming the most CPU at this time by
creating a table with the cumulative values so we can pull out the current usage:

drop table t_CPU;
create table t_CPU storage
(initial 64k next 64k pctincrease 0) pctfree 0
as select a.sid,a.value
from v$sesstat a,v$statname b
where b.name = 'CPU used by this session'
and a.statistic# = b.statistic#;

column name format a25
column username format a20
column CPU heading 'CPU|Seconds'

select a.sid, c.username,b.name,(a.value-t.value)/100 CPU
from v$session c,v$sesstat a,v$statname b,t_CPU t
where c.sid = a.sid
and c.sid=t.sid
and b.name = 'CPU used by this session'
and a.statistic# = b.statistic#
and a.value > 0
and a.value - t.value > 0
order by a.value-t.value;

drop table t_CPU;

CPU
SID USERNAME NAME Seconds
------ -------------------- ------------------------- ----------
52 CPU used by this session .01
. . .
23 SYSTEM CPU used by this session 35.82
177 ORADBA CPU used by this session 38.26
167 ORADBA CPU used by this session 52.07

You can see that SID 167 is the current largest consumer. What we are looking for, then, is code that is in a tight loop in memory, just processing over and over. If this CPU is being consumed by normal usage, then no changes are needed. So let's see what SID 167 is running:

select c.executions,trunc(a.last_call_et/60) min,b.sql_text
from v$session a, v$sqltext b, v$sqlarea c
where a.sid=167
and a.sql_address = b.address
and a.sql_address = c.address
order by b.piece;


EXECUTIONS MIN SQL_TEXT
---------- ------ ----------------------------------------------------------------
94784 0 select daily_notes_id, center_code, TO_CHAR(day,'MM/DD/YYYY')
day, notes from daily_notes where center_code = :1
and day = TO_DATE(:2,'MM/DD/YYYY') order by 1

There is nothing wrong with this code and after watching for a while we see that this user is not running any code that jumps out as bad. We conclude that this session is just doing the most work.


Highest Consuming Statements
We can also check for the statements with the most buffer gets. These are probably working the most in memory:

set linesize 140
column executions format 999,999,999 heading "Executions"
column disk_reads format 999,999,999 heading "Block Reads"
column buffer_gets format 999,999,999 heading "Buffer Reads"
column sql_text format a90 heading "Statement"
break on executions on buffer_gets skip 1

--we want the 2 largest consumers

variable max_buffer_read number
begin
select max(buffer_gets) into :max_buffer_read
from v$sqlarea
where parsing_user_id >0;

select max(buffer_gets) into :max_buffer_read
from v$sqlarea
where parsing_user_id >0
and buffer_gets < :max_buffer_read;
end;
/

select a.executions,a.buffer_gets,b.sql_text
from v$sqlarea a,v$sqltext b
where a.buffer_gets >= :max_buffer_read
and a.address = b.address
and a.parsing_user_id > 0
order by a.buffer_gets desc, piece asc;


Executions Buffer Reads Statement
---------- ------------ -------------------------------------------------------------
6,300,634 51,221,671 SELECT DISTINCT B.FIRST_NAME || ' ' || B.LAST_NAME
FROM BLOCK_PHYSICIAN A,PHYSICIAN B
WHERE A.PHYSICIAN_FID=B.HYSICIAN_ID AND A.BLOCK_TIME_FID=:b1

75,265 27,227,798 select client_id,greeting_lu,first_name,last_name
,home_phone,work_phone,current_status_lu,center_code
,start_time
,TO_CHAR(start_time, 'MM/DD/YYYY HH:MI AM') start_datetime
,appointment_type_lu, appointment_id , number_attending
,offsite, offsite_location,treatment_info, bgcolor
,(TO_NUMBER(TO_CHAR(start_time,'HH24'))*60)
+TO_NUMBER(TO_CHAR(start_time,'MI')) time,physician
,TO_CHAR(start_time, 'HH:MI AM') start_display_time,note
from appointment_view_no_phys
where center_code = :1
and start_time between TO_DATE(:2,'MM,DD,YYYY')
and TO_DATE(:3,'MM,DD,YYYY')+1
and appointment_type_lu in
(:4,:5,:6,:7,:8,:9,:10,:11,:12,:13,:14)
and (current_status_lu not in
('Cancelled','Rescheduled','Attended','NoShowed'))
order by start_time,last_name,first_name

Remember that these numbers are those that are emerging since the instance has come up.
We see that the first statement is doing less than 10 buffer reads per execution on an average. Because I know that these tables are in the keep buffers, this makes sense.
The second statement is doing about 361 buffer gets per execution. This will be one to investigate and see how this view is joining. Maybe we can make this more efficient.