7 Important CellCLI Commands for Exadata DBA
Now you are an Exadata DBA and I suppose you know Exadata basic Components and Features, your next responsibility is to manage cell Servers. This includes checking health of cell server and maintaining them. Exadata has a very good Health Check Tool Called as Exachk, DBA can use it.Oracle introduced a Cell Command Line Interface know as cellCLI for Database Administrator to interact with cell servers.
Before starting CellCLI Exadata DBA commands let's first understand Storage structure of cell Server. All the storage is presented to Cell Servers, Thy are provided to database server but as ASM Disks only. First of all there is a physical disk which is provided to cell server as LUN, Exadata S/W makes Cell disk on top of it. Now each cell disk is divided in to three grid disks data, reco, system_dg. Grid disks are provide to ASM storage for use.
The relation between physical disk, LUN, Cell Disk and Grid disk is given below:
From the above diagram it's clear to work on Exadata Cell level we need to use CellCLI and sometimes Linux commands.
Here, I am listing some most common use of CellCLI command for Exatadata DBA's to manage cell Servers.
1. Check Cell Status: This command is used to check cell status, This command will list all basic information about a cell like Name of cell, Status, status of all services cellsrv, ms and rs etc.
CellCLI> list cell
exadatalcel10 online
For Detailed display of cell configuration use below command.CellCLI> list cell detail
name: exadatalcel10
bbuTempThreshold: 60
bbuChargeThreshold: 800
bmcType: IPMI
cellVersion: OSS_11.2.3.2.1_LINUX.X64_130109
cpuCount: 24
diagHistoryDays: 7
fanCount: 12/12
fanStatus: normal
flashCacheMode: WriteThrough
id: 1038FMKIRLN
interconnectCount: 3
interconnect1: bondib0
iormBoost: 0.0
ipaddress1: 192.168.11.23/25
kernelVersion: 2.6.32-400.11.1.el5uek
locatorLEDStatus: off
makeModel: Oracle Corporation SUN FIRE X4270 M2 SERVER SAS
metricHistoryDays: 7
offloadEfficiency: 1,000.0
powerCount: 2/2
powerStatus: normal
releaseVersion: 11.2.3.2.1
releaseTrackingBug: 14522699
status: online
temperatureReading: 29.0
temperatureStatus: normal
upTime: 46 days, 2:31
cellsrvStatus: running
msStatus: running
rsStatus: running
Exadata DBA has to check fields highlighted in above output for health of Exadata Cell.To display only few fields of whole cell output, use attribute and their name with comma separation.
CellCLI> list cell attributes name,interconnectCount
exadatalcel10 3
2. Physical Disk Information: As shown in above figure, all Physical disks are attached to cell servers. So DBA can also list down all physical disks on the servers and detail about them.CellCLI> list physicaldisk
20:0 E15SBS normal
20:1 E15QMK normal
20:2 E18SWT normal
20:3 E18SW4 normal
20:4 E18VFV normal
20:5 E138FM normal
20:6 E18SW7 normal
20:7 E18V8C normal
20:8 E13HC9 normal
20:9 E1370B normal
20:10 E12300 normal
20:11 E18VG7 normal
FLASH_1_0 0944M01FMP normal
FLASH_1_1 0944M01F8X normal
FLASH_1_2 0944M01F7A normal
FLASH_1_3 0944M01FRD normal
FLASH_2_0 1030M03TT5 normal
FLASH_2_1 1030M03TWV normal
FLASH_2_2 1030M03TT6 normal
FLASH_2_3 1030M03TTD normal
FLASH_4_0 1030M03T3U normal
FLASH_4_1 1030M03RQL normal
FLASH_4_2 1030M03RQM normal
FLASH_4_3 1030M03TEH normal
FLASH_5_0 0929M00Q2P normal
FLASH_5_1 0938M016CG normal
FLASH_5_2 0940M018AY normal
FLASH_5_3 0942M01AJA normal
Above command shows that: This cell has 12 Physical disks (20:0 to 20:11)and 5 Flash cards (FLASH1 to FLASH5) and each Flash card has four partitions like FLASH_1_0 to FLASH_1_3.To look into detail of each physical disk DBA should use this.
CellCLI> list physicaldisk E15SBS detail name: 20:0 deviceId: 19 diskType: HardDisk enclosureDeviceId: 20 errMediaCount: 0 errOtherCount: 0 foreignState: false luns: 0_0 makeModel: "SEAGATE ST360057SSUN600G" physicalFirmware: 0B25 physicalInsertTime: 2013-03-10T14:45:18-04:00 physicalInterface: sas physicalSerial: E15SBS physicalSize: 558.9109999993816G slotNumber: 0 status: normal CellCLI> list physicaldisk FLASH_5_3 detail name: FLASH_5_3 diskType: FlashDisk luns: 5_3 makeModel: "Sun Flash Accelerator F20 PCIe Card" physicalFirmware: D21Y physicalInsertTime: 2013-03-10T14:45:18-04:00 physicalSerial: 0942M01AJA physicalSize: 22.8880615234375G slotNumber: "PCI Slot: 5; FDOM: 3" status: normalIn above two outputs, Database Administrator can easily identify if it's a physical disk of flash diskusing diskType field.
Some other options are.
CellCLI> list physicaldisk attributes name, disktype, errCmdTimeoutCount, errHardReadCount, errHardWriteCount
CellCLI> list physicaldisk where diskType='Flashdisk'
CellCLI> list physicaldisk where diskType=flashdisk and status='poor performance' detail
3. LUN Disks Detail: A physical disk is presented as LUN to Cell storage server. Though providing Physical Disks to Storage as LUN is an automatic process taken care by Exadata S/W itself, but sometimes these are not presented in that case Exadata DBA has to find out LUN details.
CellCLI> list lun 0_0 0_0 normal 0_1 0_1 normal 0_2 0_2 normal 0_3 0_3 normal 0_4 0_4 normal 0_5 0_5 normal 0_6 0_6 normal 0_7 0_7 normal 0_8 0_8 normal 0_9 0_9 normal 0_10 0_10 normal 0_11 0_11 normal 1_0 1_0 normal 1_1 1_1 normal 1_2 1_2 normal 1_3 1_3 normal 2_0 2_0 normal 2_1 2_1 normal 2_2 2_2 normal 2_3 2_3 normal 4_0 4_0 normal 4_1 4_1 normal 4_2 4_2 normal 4_3 4_3 normal 5_0 5_0 normal 5_1 5_1 normal 5_2 5_2 normal 5_3 5_3 normal CellCLI> list lun 0_0 detail name: 0_0 cellDisk: CD_00_exadatalcel10 deviceName: /dev/sda diskType: HardDisk id: 0_0 isSystemLun: TRUE lunAutoCreate: FALSE lunSize: 557.861328125G lunUID: 0_0 physicalDrives: 28:0 raidLevel: 0 lunWriteCacheMode: "WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU" status: normal CellCLI> list lun 5_3 detail name: 5_3 cellDisk: FD_15_exadatalcel10 deviceName: /dev/sdy diskType: FlashDisk id: 5_3 isSystemLun: FALSE lunAutoCreate: FALSE lunSize: 22.8880615234375G physicalDrives: FLASH_5_3 status: normalThe difference between above two LUNs 0_0 and 5_3 is:
a. 0_0 is hard disk while 5_3 is a flash Disk.
b. 0_0 is a system lun whereas 5_3 is not a system lun.
The important thing to notice for Exadata DBA is status of LUN shown as "status: normal". If status is not normal DBA has to open an SR with Oracle Support.
Few more LUN related commands
CellCLI> list lun attributes name, cellDisk, raidLevel, status
CellCLI> list lun where disktype=harddisk
4. Cell Disks Report: LUN's are presented as cell disks to the cell server, which is actually a part of Exadata technology, Till Physical disks and LUN's it was common Linux OS concepts.
CellCLI> list celldisk CD_00_exadatalcel10 normal CD_01_exadatalcel10 normal CD_02_exadatalcel10 normal CD_03_exadatalcel10 normal CD_04_exadatalcel10 normal CD_05_exadatalcel10 normal CD_06_exadatalcel10 normal CD_07_exadatalcel10 normal CD_08_exadatalcel10 normal CD_09_exadatalcel10 normal CD_10_exadatalcel10 normal CD_11_exadatalcel10 normal FD_00_exadatalcel10 normal FD_01_exadatalcel10 normal FD_02_exadatalcel10 normal FD_03_exadatalcel10 normal FD_04_exadatalcel10 normal FD_05_exadatalcel10 normal FD_06_exadatalcel10 normal FD_07_exadatalcel10 normal FD_08_exadatalcel10 normal FD_09_exadatalcel10 normal FD_10_exadatalcel10 normal FD_11_exadatalcel10 normal FD_12_exadatalcel10 normal FD_13_exadatalcel10 normal FD_14_exadatalcel10 normal FD_15_exadatalcel10 normal CellCLI> list celldisk CD_08_exadatalcel10 detail name: CD_08_exadatalcel10 comment: creationTime: 2013-03-12T13:44:44-04:00 deviceName: /dev/sdi devicePartition: /dev/sdi diskType: HardDisk errorCount: 0 freeSpace: 0 id: 81d86abc-6e34-4775-a8a3-ccb8c7adeabe interleaving: none lun: 0_8 physicalDisk: E18SVS raidLevel: 0 size: 557.859375G status: normalFew other cell disk commands
CellCLI> list celldisk attributes name, devicePartition where size>200g;
CellCLI> list celldisk attributes name,status,size
5. Grid Disk Knowledge: Each Hard disk type Cell disk is divided in to three grid disks, Which are reco, data, dbfs. Reco is used for redo log file, Data is used for storing database datafiles and dbfs is for other purpose like back etc.
By default Data is only stored in Hard Disk type grid disks not in Flash Disk type gird disks, grid disk made out of Flash disks are used for Cell level data buffering purpose.
Each Exadata system has 12 Hard disk and each Hard disk has three grid disk made out of it, So there should be total 36 Grid Disks.
CellCLI> list griddisk
DATA_DMORL_CD_00_exadatalcel10 active
DATA_DMORL_CD_01_exadatalcel10 active
DATA_DMORL_CD_02_exadatalcel10 active
DATA_DMORL_CD_03_exadatalcel10 active
DATA_DMORL_CD_04_exadatalcel10 active
DATA_DMORL_CD_05_exadatalcel10 active
DATA_DMORL_CD_06_exadatalcel10 active
DATA_DMORL_CD_07_exadatalcel10 active
DATA_DMORL_CD_08_exadatalcel10 active
DATA_DMORL_CD_09_exadatalcel10 active
DATA_DMORL_CD_10_exadatalcel10 active
DATA_DMORL_CD_11_exadatalcel10 active
DBFS_DG_CD_02_exadatalcel10 active
DBFS_DG_CD_03_exadatalcel10 active
DBFS_DG_CD_04_exadatalcel10 active
DBFS_DG_CD_05_exadatalcel10 active
DBFS_DG_CD_06_exadatalcel10 active
DBFS_DG_CD_07_exadatalcel10 active
DBFS_DG_CD_08_exadatalcel10 active
DBFS_DG_CD_09_exadatalcel10 active
DBFS_DG_CD_10_exadatalcel10 active
DBFS_DG_CD_11_exadatalcel10 active
RECO_DMORL_CD_00_exadatalcel10 active
RECO_DMORL_CD_01_exadatalcel10 active
RECO_DMORL_CD_02_exadatalcel10 active
RECO_DMORL_CD_03_exadatalcel10 active
RECO_DMORL_CD_04_exadatalcel10 active
RECO_DMORL_CD_05_exadatalcel10 active
RECO_DMORL_CD_06_exadatalcel10 active
RECO_DMORL_CD_07_exadatalcel10 active
RECO_DMORL_CD_08_exadatalcel10 active
RECO_DMORL_CD_09_exadatalcel10 active
RECO_DMORL_CD_10_exadatalcel10 active
RECO_DMORL_CD_11_exadatalcel10 active
From the above output, Exadata DBA can see there are 12 Data grid disks, 12 Reco grid disks but only 10 DBFS grid disks total makes 34 grid disk, while this should be 36. Rest two partitions on disk are used for Cell S/W and for it RAID configuration.
Now, let's verify this.
DBA can find out cell disk and we will cross check cell disk size with sum of all grid disk made out of it.
CellCLI> list celldisk CD_02_exadatalcel10 detail name: CD_02_exadatalcel10 comment: creationTime: 2013-03-12T13:44:41-04:00 deviceName: /dev/sdc devicePartition: /dev/sdc diskType: HardDisk errorCount: 0 freeSpace: 0 id: 2b5fc449-581a-431d-96ed-9fea2b1ae7dd interleaving: none lun: 0_2 physicalDisk: E15QLT raidLevel: 0 size: 557.859375G status: normal CellCLI> list griddisk DATA_DMORL_CD_02_exadatalcel10 detail name: DATA_DMORL_CD_02_exadatalcel10 asmDiskgroupName: DATA_DMORL asmDiskName: DATA_DMORL_CD_02_exadatalcel10 asmFailGroupName: exadatalcel10 availableTo: cachingPolicy: default cellDisk: CD_02_exadatalcel10 comment: creationTime: 2013-03-12T13:48:24-04:00 diskType: HardDisk errorCount: 0 id: 12a03284-bec7-4187-856b-b1aa1d8a112f offset: 32M size: 423G status: active CellCLI> list griddisk DBFS_DG_CD_02_exadatalcel10 detail name: DBFS_DG_CD_02_exadatalcel10 asmDiskgroupName: DBFS_DG asmDiskName: DBFS_DG_CD_02_exadatalcel10 asmFailGroupName: exadatalcel10 availableTo: cachingPolicy: default cellDisk: CD_02_exadatalcel10 comment: creationTime: 2013-03-12T13:47:15-04:00 diskType: HardDisk errorCount: 0 id: 123ba6ce-b761-4645-9d58-08ee669777e8 offset: 528.734375G size: 29.125G status: active CellCLI> list griddisk RECO_DMORL_CD_02_exadatalcel10 detail name: RECO_DMORL_CD_02_exadatalcel10 asmDiskgroupName: RECO_DMORL asmDiskName: RECO_DMORL_CD_02_exadatalcel10 asmFailGroupName: exadatalcel10 availableTo: cachingPolicy: none cellDisk: CD_02_exadatalcel10 comment: creationTime: 2013-03-12T13:48:31-04:00 diskType: HardDisk errorCount: 0 id: cc9c6665-822c-4f92-84b1-b6e63411c3a6 offset: 423.046875G size: 105.6875G status: active
Here, cell disk CD_02_exadatalcel10 size is 560G, Which is divided in three grid disks DATA_DMORL_CD_02_exadatalcel10 size 424G, DBFS_DG_CD_02_exadatalcel10 size 30G and RECO_DMORL_CD_02_exadatalcel10 size 106G. Total of all three is also equal to 560G. Hence this is verified.
Some other commands:
CellCLI> list griddisk attributes name, cellDisk, diskType where disktype='harddisk'
CellCLI> list griddisk attributes name where asmdeactivationoutcome='yes'
6. Display Exadate Alerts: As database gives give error messages into database alert log files, same way Cell server also write cell related alerts into cell alert log file.
Location of cell alert log file for Exadata Cell is /opt/oracle/cell/log/diag/asm/cell/{node name}/trace/alert.log or if the CELLTRACE parameter is set just do cd $CELLTRACE. Exadata DBA can also check cell alerts using cell commands.
CellCLI> list alerthistory
1 2013-03-10T14:45:37-04:00 info "Factory defaults restored for Adapter 0"
2 2013-03-10T14:45:39-04:00 info "Factory defaults restored for Adapter 0"
3_1 2013-03-10T14:50:02-04:00 critical "Cell configuration check discovered the following problems: Check Exadata configuration via ipconf utility Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf Error. Exadata configuration file not found /opt/oracle.cellos/cell.conf [INFO] The ipconf check may generate a failure for temporary inability to reach NTP or DNS server. You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] As root user run /usr/local/bin/ipconf -verify -semantic to verify consistent network configurations."
3_2 2013-03-10T15:04:30-04:00 clear "The cell configuration check was successful."
4 2013-03-10T14:59:18-04:00 critical "RS-7445 [Required IP parameters missing] [Check cellinit.ora] [] [] [] [] [] [] [] [] [] []"
CellCLI> list alerthistory 8_1 detail
name: 8_1
alertMessage: "Cell configuration check discovered the following problems: Check Exadata configuration via ipconf utility Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf Checking DNS server on 192.135.82.132 : FAILED Error. Overall status of verification of Exadata configuration file: FAILED [INFO] The ipconf check may generate a failure for temporary inability to reach NTP or DNS server. You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] As root user run /usr/local/bin/ipconf -verify -semantic to verify consistent network configurations."
alertSequenceID: 8
alertShortName: Software
alertType: Stateful
beginTime: 2013-05-03T22:46:50-04:00
endTime: 2013-05-04T22:46:35-04:00
examinedBy:
metricObjectName: checkconfig
notificationState: 0
sequenceBeginTime: 2013-05-03T22:46:50-04:00
severity: critical
alertAction: "Correct the configuration problems. Then run cellcli command: ALTER CELL VALIDATE CONFIGURATION Verify that the new configuration is correct."
DBA has to give priority for alert having "severity: critical".Few other commands:
CellCLI> list alerthistory where severity like '[warning|critical]'
CellCLI> list alertdefinition detail
7. Restart Cell Services: Cell server runs three services cellsrv, ms and rs. Sometimes exadata Database Administrator has to restart these services and to check status
To check service status use
CellCLI> list cell detail
name: exadatalcel10
bbuTempThreshold: 60
bbuChargeThreshold: 800
bmcType: IPMI
cellVersion: OSS_11.2.3.2.1_LINUX.X64_130109
cpuCount: 24
diagHistoryDays: 7
fanCount: 12/12
fanStatus: normal
flashCacheMode: WriteThrough
id: 1038FMM04N
interconnectCount: 3
interconnect1: bondib0
iormBoost: 0.0
ipaddress1: 192.168.10.19/22
kernelVersion: 2.6.32-400.11.1.el5uek
locatorLEDStatus: off
makeModel: Oracle Corporation SUN FIRE X4270 M2 SERVER SAS
metricHistoryDays: 7
offloadEfficiency: 1,000.0
powerCount: 2/2
powerStatus: normal
releaseVersion: 11.2.3.2.1
releaseTrackingBug: 14522699
status: online
temperatureReading: 29.0
temperatureStatus: normal
upTime: 47 days, 3:25
cellsrvStatus: running
msStatus: running
rsStatus: running
Here last three lines of output shows the status of services which are running. To restart the service use below commandCellCLI> alter cell restart services rs Stopping RS services... The SHUTDOWN of RS services was successful. Starting the RS services... Getting the state of RS services... running CellCLI> alter cell restart services ms CellCLI> alter cell restart services cellsrvTo Restart all services in one command and shutdown a service use
CellCLI> alter cell restart services all Stopping the RS, CELLSRV, and MS services... The SHUTDOWN of services was successful. Starting the RS, CELLSRV, and MS services... Getting the state of RS services... running Starting CELLSRV services... The STARTUP of CELLSRV services was successful. Starting MS services... The STARTUP of MS services was successful. CellCLI> alter cell shutdown services rs CellCLI> alter cell shutdown services ms CellCLI> alter cell shutdown services cellsrvThese are important commands for Exadata DBA cell operations perspective, Share in comments if you know few more useful commands.
No comments:
Post a Comment