Asm Health Checker Found 1 New Failures Updated [2021]
Troubleshooting "ASM Health Checker Found 1 New Failures Updated"
If you are an Oracle Database Administrator, seeing the alert "ASM Health Checker found 1 new failures updated" in your logs or monitoring dashboard (like Enterprise Manager) can be a bit jarring. This message is the Oracle Automatic Storage Management (ASM) framework’s way of telling you that its internal diagnostic engine has detected an issue that could compromise the health of your storage layer.
Here is a deep dive into what this error means, why it happens, and how to resolve it. What is the ASM Health Checker?
The ASM Health Checker is a proactive diagnostic utility that runs within the Oracle Grid Infrastructure. It constantly monitors the state of ASM disk groups, metadata consistency, and background processes.
When it detects a discrepancy—such as a corrupted metadata block, a disk timeout, or an offline disk—it logs a "failure." The "Updated" status usually means the health check engine has refreshed its findings and confirmed that the issue is persistent and requires administrator intervention. Common Causes for This Alert
While the message itself is a general notification, the "1 new failure" usually stems from one of the following:
Disk Connectivity Issues: A physical disk or LUN has become unreachable or is experiencing intermittent latency.
Metadata Corruption: Inconsistency in the ASM Allocation Units (AU) or disk headers.
Disk Group Imbalance: A rebalance operation failed or was interrupted, leaving the disk group in a "degraded" state.
Offline Disks: A disk was dropped or taken offline due to I/O errors, but the redundancy (if using Normal or High redundancy) kept the database running. Step-by-Step Resolution Guide 1. Identify the Specific Failure
The alert message is just the "headline." You need to find the specific error code (like ORA-15032 or ORA-15078).
Check the Alert Log: Navigate to your ASM diagnostic trace folder and check the alert_+ASM.log.
Use ADRCI: Run the command adrci and use show alert to see the most recent incidents and their specific impact. 2. Query the ASM Views
Log into your ASM instance via SQL*Plus (sqlplus / as sysasm) and run the following to see the status of your disks:
SELECT group_number, name, state, type FROM v$asm_diskgroup; SELECT path, header_status, mode_status, state FROM v$asm_disk; Use code with caution.
Look for any disks where the header_status is CANDIDATE (instead of MEMBER) or mode_status is OFFLINE. 3. Check for Ongoing Rebalances
Sometimes the health checker flags a failure if a rebalance is stuck. SELECT * FROM v$asm_operation; Use code with caution.
If an operation is hanging, you may need to investigate the underlying I/O subsystem. 4. Run a Manual Check (The "Check" Command) asm health checker found 1 new failures updated
You can force ASM to verify the consistency of a disk group to see if it clears the error or provides more detail: ALTER DISKGROUP Use code with caution. Proactive Tips to Prevent Future Failures
Monitor I/O Latency: Often, the health checker finds a "failure" simply because a storage array is too slow. Monitor your OS-level tools like iostat or sar.
Update Grid Infrastructure: Ensure you are on the latest RU (Release Update), as Oracle frequently releases patches for ASM Health Checker "false positives."
Verify Redundancy: Always ensure your critical disk groups are at least on "Normal" redundancy to allow the health checker to find and fix issues without taking the database offline.
The "ASM Health Checker found 1 new failures updated" alert is a call to action. It usually indicates a physical storage hiccup or a metadata inconsistency. By checking the ASM alert logs and querying v$asm_disk, you can usually pinpoint the culprit disk and bring it back online or replace it before a total outage occurs.
The phrase "asm health checker found 1 new failures updated" typically refers to a notification from the Oracle Autonomous Health Framework (AHF) or its components like
. This system continuously monitors Oracle Automatic Storage Management (ASM) and cluster environments for issues related to stability, configuration, and performance. Understanding the Notification
When the ASM Health Checker reports a "new failure," it means that a scheduled or on-demand audit has detected a condition that violates Oracle's best practices or indicates a hardware/software fault. The "updated" status indicates that the health check repository has been refreshed with this latest finding. Common Causes for ASM Failures
Failures in the ASM environment can range from minor configuration warnings to critical disk issues: Disk Visibility or Permissions
: ASM instances may lose sight of a disk due to OS-level permission changes or SAN/storage connectivity issues. Disk Group Redundancy Issues
: A failure might be triggered if a disk group drops below its required redundancy level (e.g., a disk failing in a "Normal" redundancy group). Space Constraints
: The health checker often flags when a disk group is nearing capacity or if the Fast Recovery Area (FRA) Configuration Drift
: Changes to initialization parameters or clusterware settings that don't align with Oracle's Recommended Best Practices Troubleshooting Steps
To resolve the failure, follow these standard diagnostic procedures: Generate a Health Report : Run a manual check using the Oracle AHF tfactl orachk to get a detailed HTML report of the specific failure. Check the Alert Log
: Inspect the ASM instance alert log (usually found in the Automatic Diagnostic Repository or ) for specific error codes like (disk full) or (disk group mount failure). Verify Disk Status asmcmd lsdsk
command to validate that all disks are present and have the correct header status. Examine Cluster Health : Ensure that the Oracle Grid Infrastructure is running correctly across all nodes using crsctl check crs
For persistent issues, you may need to gather a diagnostic package using the Incident Packaging Service (IPS) and upload it to Oracle Support exact command Troubleshooting "ASM Health Checker Found 1 New Failures
to run a manual health check for your specific Oracle version? RAC/ ASM Health Check - Oracle Forums 13 Sept 2011 —
The message "ASM Health Checker found 1 new failures" typically appears in the Oracle Automatic Storage Management (ASM) alert log when a critical issue—such as a disk failure or a forced diskgroup dismount—is detected. This is part of Oracle's fault diagnosability infrastructure designed to capture diagnostic data at the first sign of trouble. Immediate Actions to Take
If you see this message, follow these steps to identify and resolve the failure:
Check the ASM Alert Log: Review the alert log (often located in /u01/app/grid/diag/asm/+asm/+ASM/trace/alert_+ASM.log) for errors preceding the health checker message, such as ORA-15130 (diskgroup being dismounted) or ORA-15032.
Run ADRCI: Use the ADR Command Interpreter (ADRCI) to view the specific "incident" or "problem" that was logged. Command: adrci> show problem or adrci> show incident
Verify Diskgroup Status: Log into the ASM instance and check if any diskgroups are offline or if disks have been dropped. SQL> select name, state from v$asm_diskgroup;
SQL> select name, header_status, mode_status from v$asm_disk;
Investigate I/O Failures: Look for hardware-level issues, such as storage path failures, SAN/NFS connectivity problems, or OS-level permission changes that might have caused the disk to go offline. Common Causes
Disk Path Failure: The OS can no longer see the physical storage device.
Forced Dismount: ASM may force a dismount if too many disks in a failure group are lost, exceeding the redundancy limit.
Communication Issues: In a RAC environment, network or heartbeat failures between nodes can trigger ASM health alerts.
For automated assistance, you can use tools like Oracle ORAchk to run a comprehensive health check on your entire Oracle stack.
The message "ASM Health Checker found 1 new failures" is a critical alert typically generated by Oracle Automatic Storage Management (ASM). It indicates that the background health monitor has detected a significant issue within the storage layer that could impact database availability. Immediate Diagnostic Steps
To identify the specific cause, you should immediately examine the ASM alert log and current disk status:
Check the Alert Log: Look for ORA- errors (like ORA-15130 or ORA-15063) in the trace file directory:
Path: /u01/app/oracle/diag/asm/+asm/.
Verify Diskgroup Status: Run the following command in the ASM instance to see which group is affected: Alert Analysis: "ASM Health Checker Found 1 New
SQL> SELECT name, state, offline_disks FROM v$asm_diskgroup;.
Check Individual Disk Health: Identify if a specific disk has dropped or is hung:
SQL> SELECT path, header_status, mode_status FROM v$asm_disk;. Common Causes & Solutions KB88485 - My Oracle Support
Understanding the "ASM Health Checker Found 1 New Failures" Alert
Receiving the alert "ASM Health Checker found 1 new failures" in your Oracle Automatic Storage Management (ASM) alert log is a critical signal that the system has detected a problem—often related to disk accessibility or disk group integrity.
This error typically appears when the ASM instance performs an internal check and encounters an issue that could lead to a disk group being forced to dismount. Why Did This Happen?
This message is a summary alert generated by Oracle's health monitoring. Common triggers include:
Disk Connectivity Issues: A LUN or physical disk has become inaccessible due to storage network (SAN) or hardware failure.
Missing Disks: ASM cannot find a disk that is expected to be part of a disk group.
Redundancy Failures: In "Normal" or "High" redundancy groups, the failure of a disk or a whole failure group can trigger this checker.
Data Corruption: Specific block corruption within a disk group. Step-by-Step Response Plan 1. Analyze the Alert Log
The "1 new failure" message is just a summary. You must check the ASM alert log (and often the associated trace files) for the specific ORA- error codes following it. Look for: ORA-15032: Not all alterations performed. ORA-15040: Diskgroup is incomplete. ORA-15042: ASM disk is missing from the group. 2. Check Disk and Disk Group Status
ASM resilvering – or – how to recover your crashed cluster
Alert Analysis: "ASM Health Checker Found 1 New Failures"
Next Steps
- Acknowledge this alert within [e.g., 30 minutes].
- Resolve the failure within [e.g., 4 hours] to avoid further degradation.
- Run a follow-up health check after remediation:
ALTER SYSTEM CHECK DISKGROUPS ALL;
1. Monitor ASM Health Proactively
Set up alerting on v$asm_diskgroup.offline_disks and v$asm_disk.state. Tools like Oracle Enterprise Manager (OEM) or custom scripts can detect failures before they impact applications.
Common Causes of the Alert
When the health checker reports a "new failure," it is rarely vague. It usually points to one of three specific categories:
1. Disk Accessibility Issues (I/O Errors) The most common cause is a transient or permanent I/O failure on a specific disk within a Disk Group. If a disk is slow to respond, or if the underlying storage array reports a read/write error, ASM marks the disk as "offline" or flags the error. The "1 new failure" often refers to a specific disk in a failure group becoming inaccessible.
2. Redundancy Loss In a Normal or High Redundancy ASM setup, data is mirrored. If one mirror copy becomes corrupted or unavailable, ASM can still function on the remaining copies. The health checker will flag this as a failure because the system is no longer fully redundant. A second disk failure during this state could result in data loss.
3. Corruption or Metadata Mismatch Occasionally, the failure is logical rather than physical. The health checker runs validation algorithms on ASM metadata (the maps that tell ASM where data lives). If it finds a mismatch between the metadata and the physical blocks, it logs a failure.