Wednesday, February 16, 2011

Knowledge Point from Exadata POC4

0. Environment Description

    X2-2 Quarter rack(X2-2 Technical Specifications here)


 

1. Speeding up data loading speed by splitting multiple files.

Oracle database can load external table file parallel automatically, but in some cases parallel loading cannot be achieved, here is the some potential cases that parallel loading is not available:

  • Sequential data sources (such as a tape drive or pipe)
  • Data in any multibyte character set whose character boundaries cannot be determined starting at an arbitrary byte in the middle of a string
  • Records with the VAR format

In POC4 project, we are experiencing one of the situations: we use SJIS character CSV file with delimiter format (i.e. non-fixed length). The elapsed time to load data from external table into heap table is 52 seconds.

Since external table processing is also a CPU intensive operation, to speed up data loading on multiple-CPU environment, we need to use as more CPU as possible. (This is not true for a busy production system.) On the condition restricted by the above cases, work-around is to split the external files into smaller pieces and re-define the external table with file pointing to the smaller piece files.


 

The performance boosts almost linearly:

The X2-2 database server has 2 x six-core CPU, so I split the 1.6G file into 12 pieces and change the external table definition, the elapsed time reduced from 52 seconds to 5.6 seconds. At the same time, the "sar" output also indicates the CPU usage soaring.

No comments:

Post a Comment