The results from last performance test outperformed even optimistic expectations?
Before publishing those test results we recommend to check, if you did run into one of the following test traps (pitfalls):
Trap | Potential differences comparing tests (*) | Potential differences comparing Tests with Production |
---|---|---|
Trap 1: Unrealistic Cache Hits | yes | yes |
Trap 2: Sorted Test Data | no | yes |
Trap 3: No-Archivelog-Mode | no | yes |
Trap 4: Triggers / Snapshot Logs for Replication to EDW / DWH / Reporting Server deactivated | no | yes |
Trap 5: Varying Storage Performance | yes | yes |
(*) Comparing tests on same test environment with same test data
in
when repeating test runs with same test data but different testing parameters - a typical scenario during extensive performance-, load- and stress testing.
Cache hits in the file system cache and in the SAN-cache can be identified by the disk latency reported in the database performance statistics.
Disk latency of less then 5 milliseconds for reading one 8 KB or 16 KB database block are “too fast” for a real spindle-hit.
5 milliseconds latency are excellent values, 6-8 milliseconds are good.
For the case that you do not expect such (high) cache hits in file system cache and SAN cache in realistic production scenario, this fact needs to be reported.
Back to top
Database Cache: Before each tests flush the database cache; if your database does not provide such a command, you need to restart the database.
File System Cache: Reboot the server.
SAN-Cache: for the case that you do not expect cache hits on the SAN storage in real production cycle, you need to include in your performance test results not only the measured raw data but also an estimation of slower but realistic results considering real production scenarios. You might also discuss with your SAN-vendor this topic, e.g. if caching can be reduced during your test.
Pre-requisite for performance tests is the creation of sufficient volume of test data – in optimal case the full volume of customers.
For that purpose scripts are used to create artificial customers, e.g.
customer_id = 1, name = “Customer1”
customer_id = 2, name = “Customer2”.
Assuming an average record length of 200 bytes, a 8 KB database block will contain data of about 40 customers.
If in real live those customers create randomly transactions, and due to the large size of your database table(s) containing customer data only a small percentage of those table(s) is in the database cache, then this will result in physical reads; and in most cases only 1 of the 40 customers in the database block read from disk will be accessed.
However, if your transaction test data for your performance test contain the transactions ordered by customer id, then during the test all 40 customer records will be accessed in the short time this database block remains in the database cache.
Create test data with sufficient randomness, and ensure the same randomness during processing those test data.
Back to top
Performance Tests may show too good results, faster than the performance achievable in real production environment.
This trap has high impact on environments which are already suffering from slow disk reads and slow disk writes (IO bound).
Note: To avoid that the file system for archivelog files runs full, some testers just write scripts to delete (rm) those files; this often used practice is not optimal, as the additional disk IO load caused by reading those files again to backup those to tape is missing.
A more detailed evaluation of this topic is available in
Transaction Log / Archive Log Volume
– An "Awareness Paper" for Application Support / Application Operations, Operations Managers and Change Managers.
Recording data changes using triggers, such that other systems can easily replicate just those changed data, is a very common practice.
Those change-tracking-records (e.g. "Snapshot Logs") stored in database tables are automatically deleted after the changed data have been replicated to the destination system(s).
However, those destination systems won't replicate data from a testing system.
For this reason the recording of those data changes is just stopped, as otherwise those database tables recording the changes would grow forever.
Impact and Testing Trap: In this case, each DML (insert, update, delete) is faster, consumes less CPU and creates less transaction log / redo log volume.
Note 1:
If the too low redo log / archive log / transaction log volume is used for
then those too low values will result in wrong sizing - and could, in worst case - cause production incidents because of archive log destination / transaction log filesystem running full.
Example: Updating a database column from "1" to "0" or "null" will result in only a few bytes transaction log / redo log, but would create a much longer "insert" in the change-tracking table. For this reason the increase of transaction log / redo log volume can be significant; it depends on the type of database activity and is very application specific.
Note 2:
The database writer process has also less work to process, and it may happen that this process (or the IO-system) becomes on the real production system the bottleneck, due to (much) higher database write activity.
A dedicated server for performance testing to avoid any impact from other testing activities, is common practice.
No CPU or Memory is consumed by anything else than by performance testing.
In case that local disks are used, the performance of the storage (disk IO performance) should be constant too.
However, if a shared storage, e.g. a SAN (Storage Area Network) or NAS (Network attached Storage) is used, the IO-performance might vary between performance tests.
Even QoS (Quality of Service) cannot eliminate that completely.
E.g. defining a target of 8 milliseconds per random read, would ensure that the latency won't increase beyond 8 milliseconds.
However, some of the test runs might get still better values.
Measure and document the disk latency during each test.
In case that the values are quite constant during all tests, they are easily comparable.
In case that the results vary over the tests, the impact of the varying IO-performance needs to be determined and considered when comparing the test results.
Detailed Statistics show, that during a test run with IO-read-latency of 6 ms within 1 hour
30 minutes - CPU consumption by application process 8 minutes - CPU consumption on Database 20 minutes - disk waits for random read 2 minutes - log writer waiting for disk ---------------------------------------------------- 60 minutes total elapsed time
in case that disk read latency increases from 6 ms/read to 12 ms/read, the total time for disk reads will increase from 20 minutes to 40 minutes, the total time from 60 minutes to 80 minutes. (Assumption: No other parameter was changed, the number of disk reads is constant, and the change in disk read time just proportional to the change in disk latency.)
In case that in some tests the number of disk reads is lower, this could be the result of Trap Nr. 1 - "Unrealistic Cache hits" in database or application cache.
Documenting the time slices for each test run as well as details of number of disk reads / writes and disk latency should show a clear correlation to the varying disk latency.
This detailed documentation of the impact of varying disk latency is an additional benefit for the future production use, as varying disk performance is also observed on most production systems.
Run several performance tests with same test case and same parameters, except varying IO-performance.
Obstacle: You might not have any chance to influence the IO-performance, and if it's suddenly stable you won't get the expected results.
Often storage-team monitors and documents SAN-performance over time. In case of good communication with Storage team you might get information when backups are run, or SAN-internal activities (synchronizing / establishing a 3rd mirror) are taking place, or other servers connected to the SAN run IO-intensive jobs impacting overall SAN-performance. This will help you to schedule performance tests during those periods.
Otherwise, you might have already detected a pattern of varying IO performance over time (and probably already tracked back to e.g. concurrent backups ...) such that you can schedule test runs at that time.