I am new registered user here in this UNIX forums.
I am a new system administrator for AIX 6.1. One of our servers performs poorly every time our application (FINACLE) runs many processes/instances. (see below for topas snapshot)
I use NMON or Topas to monitor the server utilization. I checked the the CPU Idle% and the idle percent is high, however the DISK Busy% is constantly high (during real poor performance, the DISK Busy% is most of the time 100%). Also, I noticed that the FILE/TTY Readch and Writech are constantly high too. See topas snapshot below:
here's our server specs:
Everytime this happens, we try to kill processes that is CPU consuming, but still, the DISK Busy% is high. If we reboot the server, the performance becomes okay, but we can't do this during production. Any suggestion on how to optimize this? is it our architecture (having only 1 hard disk for our data)? Does bottle-necking takes place here? What can we do to optimize our server? Any upgrades shall we make? for example increasing physical memory.
Thank you very much. I hope you can help since I am not a UNIX expert.
Last edited by jim mcnamara; 02-08-2012 at 10:49 AM..
Reason: code tags please
Killing processes to free resources is not a good idea. You might shoot something you still need.
Yes, from the look of it you have a severe bottleneck with your 1 hdisk. Is this hdisk a physical disk or a LUN from SAN storage?
Do you use asynchronous I/O (AIO) and have it tuned? Oracle will most probably benefit from it as well as getting additional disks.
nmon/topaz has a page that displays AIO stats, I think it was shift + a, not sure though, easy to try it out anyway.
You could post the output of
(the 1st 2 commands when there is traffic on your box) and use code tags when doing so, thanks.
and post the filesystem_io_options of oracle + oracle version + something about your disk layout - so are your filesystems setup with min or max distribution, blocksize ...
output of mount command will help and definitely mounting your oracle filesystems with noatime option and if you have a dedicated dump device with rbrw
If you dont want to use SETALL in filesystem_io_options than you might want to consider the filesystems containing oracle data + redologs to be mounted with cio, how many volumegroups with how many disks do you have and similar things
In many cases a hot disk is easily avoidable by changing your filesystems from minimum to maximum distribution and reorganize the volumegroup
I would be in addition interested in vmstat -v and vmstat -s outputs on top of what zaxxon asked for already.
Please gather all data during the time where the system is busy and slow - not during an idle timeframe or the data wont help
Thanks Zaxxon & zxmaus,
I don't know where to begin before this thread opened.
For iostat -A 2 10, vmstat -wt 2 10, vmstat -v and vmstat -s, I will post a snapshot for these once the issue occurs again.
For lsattr -El aio0, i did't get anything so i tried lsattr -El sys0 (i hope it will do).
--> See attachment - lsattr sys0.jpg
For "Do you use asynchronous I/O (AIO) and have it tuned?"
--> I have no idea for this since I am new here and I came here in the middle of the application roll-out to production. I wish I had a clue. No knowledge on the history of the servers here.
However i checked the I/O stat in nmon and here it is:
-->
If physical disk or LUN from SAN
-->Im not entirely sure if it's LUN from SAN but here's what i gathered:
from prtcfg/lsdev:
For oracle version:
-->Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bi
For disk layout/fs setup:
-->In sumarry we have three hdisks. rootvg resides in 2 hdisk and applications(oravg) resides in hdisk8.
Below are the details:
there are several oracle database instances in oravg also. here they are:
for filesystemio_options:
-->I have no idea where to locate this? is this executed or set in a configuration file?
for "...min or max distribution, blocksize ...
output of mount command will help and definitely mounting your oracle filesystems with noatime option and if you have a dedicated dump device with rbrw..."
--> I am totally alost with the min/max tuning. no idea for this yet.
Again. Thanks very much for the help. It's greatly appreciated
From the data you provided so far, you have 1 raidset raid 10 from SAS (so internal storage) disks of a total of 1 TB (presented to the system as 1 disk) for 6 DBs and anything else running on the system excluding root - this just asks for problems as you access all your storage just with one serial path.
Even worse all your filesystems are sharing the same logfile and if I assume correctly and your filesystems are not mounted with noatime option that means that every single read (which includes as simple things as ls) and every single write of 8 different filesystems concur about access to the logfile which by nature makes this logfile naturally the hotspot of the entire system.
Still waiting for the vmstat outputs but I bet that your system has only the default filesystem tuning and is running out of buffers most of the time.
Can you post lvmo -a -v oravg output please to confirm?
Regarding aio - dont worry - on AIX 6.1 you find it with the ioo -a | grep aio command but AIX will turn it on automatically if oracle or any other application wants to use it.
filesystem_io_options is a variable set within oracle (ask your DBA) and can be set to none (standard I think in your oracle version), async or setall - the setall option lets decide oracle to use cio with async IO but wont let you access open database files outside of the database itself other than with rman which might be a problem if you dont do rman backups.
Please run a simple mount on the box to allow us to see if you are using any mount options on the filesystems.
So far
- consider to give each of your oravg filesystems its very own logfile
- consider another storage solution and a different filesystem layout if possible since 6 DBs in the same filesystem - even if this filesystem has its own logfile, are still not such a great idea. If that is not possible, than your disk will naturally stay busy since you only have one.
Just a quick note for macOS users.
I just installed (and removed) Parallels Desktop 15 Edition on my MacPro (2013) with 64GB memory and 12-cores, which is running the latest version of macOS Catalina as of this post. The reason for this install was to test some RIGOL test gear software which... (6 Replies)
Hi Everyone,
I have been struggling for few days with iSCSI and thought I could get some help on the forum...
fresh install of AIX7.1 TL4 on Power 710, The rootvg relies on 3 SAS disks in RAID 0, 32GB Memory
The lpar Profile is using all of the managed system's resources.
I have connected... (11 Replies)
Hi
We have an M3000 single physical processor and 8gb of memory running Solaris 10. This system runs two Oracle Databases one on Oracle 9i and One on Oracle 10g.
As soon as the Oracle 10g database starts we see an immediate drop in system performance, for example opening an ssh session can... (6 Replies)
Hello guys,
I have two servers performing the same disk operations. I believe one server is having a disk's impending failure however I have no hard evidence to prove it. This is a pair of Netra 210's with 2 drives in a hardware raid mirror (LSI raid controller). While performing intensive... (4 Replies)
Hello,
we have a machine with Solaris Express 11, 2 LSI 9211 8i SAS 2 controllers (multipath to disks), multiport backplane, 16 Seagate Cheetah 15K RPM disks.
Each disk has a sequential performance of 220/230 MB/s and in fact if I do a
dd if=/dev/zero of=/dev/rdsk/<diskID_1> bs=1024k... (1 Reply)
Hello all
We just built a storage cluster for our new xenserver farm. Using 3ware 9650SE raid controllers with 8 x 1TB WD sata disks in a raid 5, 256KB stripe size.
While making first performance test on the local storage server using dd (which simulates the read/write access to the disk... (1 Reply)
Hello,
I'm running a script on AIX to process lines in a file. I need to enclose the second column in quotation marks and write each line to a new file. I've come up with the following:
#!/bin/ksh
filename=$1
exec >> $filename.new
cat $filename | while read LINE
do
echo $LINE | awk... (2 Replies)
Hello,
i have a a1000 connected to an e6500. There's a raid 10 (12 disks) on the a1000.
If i do a
dd if=/dev/zero of=/mnt/1 bs=1024k count=1000
and then look at iostat it tells me there's a kw/s of 25000.
But if i do a
dd of=/dev/zero if=/mnt/1 bs=1024k count=1000
then i see only a... (1 Reply)
Hi you all, I have a BIG performance problem on an Sun E3500, the scenario is described below:
I have several users (30) accessing via samba to the E3500 using an application built on Visual Foxpro from their Windows PC , the problem is that the first guy that logs in demands 30% of the E3500... (2 Replies)