Asm Health Checker Found 1 New Failures New!

Subject: ASM Health Check Report – New Failures Detected

To: Database Administration Team / System Health Monitoring Group
Date: [Insert Date]
Priority: Medium

6.4 Validate After Storage Changes

Any SAN, multipath, or OS upgrade should trigger a manual health check:

asmcmd checkset -g DATA

8. Potential Edge Cases

First run ever → all failures are considered new (optionally treat as baseline, not alert)
Check item disappears from results → ignore (unless required to track)
Flapping failure (pass→fail→pass→fail) → should re-alert on each new appearance after a pass

Would you like me to extend this into:

A CLI tool implementation (Go/Python)?
A Prometheus exporter metric (asm_new_failures_total)?
A Terraform module for AWS Health Checker integration?

Troubleshooting Guide: ASM Health Checker Found 1 New Failure

If you are managing an Oracle database environment and receive the alert "ASM Health Checker found 1 new failure," it’s time to pay attention. While Oracle Automatic Storage Management (ASM) is robust, this specific notification indicates that the internal diagnostic framework has detected an issue that could potentially impact disk group availability or performance.

Here is a comprehensive breakdown of what this error means, how to diagnose it, and the steps to resolve it. 1. Understanding the ASM Health Checker (CHMA)

The ASM Health Checker is part of the Oracle Check Framework. It runs periodic checks on the ASM instance, disk groups, and metadata to ensure everything is operating within healthy parameters.

When it reports a "new failure," it means a specific "check" (such as disk connectivity, metadata consistency, or space usage) has moved from a PASS to a FAIL state. 2. Immediate Step: Identify the Failure

The alert itself is generic. To find out what actually failed, you need to query the ASM instance. Run this SQL command in your ASM instance:

SELECT check_name, failure_pri, status, repair_script FROM v$asm_healthcheck_status WHERE status = 'FAILED'; Use code with caution. Common culprits include: asm health checker found 1 new failures

Disk Offline: One or more disks in a disk group are no longer accessible.

Metadata Corruption: Inconsistencies in the ASM metadata (e.g., File Directory or Disk Directory).

Space Issues: A disk group is nearing 100% capacity, risking an instance crash.

Stale Quorum: Issues with voting files in a CRS/Grid Infrastructure environment. 3. Deep Dive into the Logs

To get the granular details, look at the ASM Alert Log. You can usually find this in your Oracle Base directory:$ORACLE_BASE/diag/asm/+asm/+asm1/trace/alert_+asm1.log

Search for the timestamp of the alert. You will often see a corresponding ORA- error code (like ORA-15078 or ORA-15032) that provides the exact technical reason for the health check failure. 4. How to Resolve the Failure Scenario A: Disk Connectivity Issues

If the health checker found a disk failure, check the OS-level connectivity. Command: lsdsk (within ASMCMD) or fdisk -l (Linux).

Fix: If a disk is "OFFLINE," try to online it using:ALTER DISKGROUP ONLINE DISK ; Scenario B: Metadata Inconsistency

If the health check indicates metadata issues, you may need to run a manual check on the disk group.

Action: Execute the CHECK command:ALTER DISKGROUP CHECK ALL;Note: This checks for consistency but does not fix errors. If errors are found, you may need to involve Oracle Support. Scenario C: Space Pressure Subject: ASM Health Check Report – New Failures

If the failure is related to "Insufficient Space," rebalance the disk group or add new disks immediately.

Action: Check free space:SELECT name, free_mb, total_mb, usable_file_mb FROM v$asm_diskgroup; 5. Clearing the Alert

Once you have fixed the underlying physical or logical issue, the Health Checker should automatically update during its next run. However, if the status remains "Failed" in the views, you can manually trigger a re-run of the health check or use ADRCI to purge the alert. Summary Checklist

Query v$asm_healthcheck_status to identify the specific check. Review the ASM Alert Log for specific ORA-error codes.

Verify Physical Disks at the OS level to ensure no hardware failure.

Check Disk Group Capacity to ensure you haven't hit a "disk full" state.

By catching these "1 new failures" early, you prevent minor disk hiccups from turning into major database outages.

The message "ASM Health Checker found 1 new failures" is a critical warning often found in Oracle Automatic Storage Management (ASM) alert logs. It typically signals that the system has detected a significant issue—such as disk corruption or a communication breakdown—that could lead to a diskgroup being forcibly dismounted.

Here is a story of a "typical" Friday night in the life of a Database Administrator (DBA) facing this error. The Friday Night Ghost in the Machine

It was 4:45 PM on a Friday. The office was thinning out, and Leo was already thinking about his weekend plans when his terminal began to scroll with red text. The monitoring system had just spat out a single, chilling line: ASM Health Checker found 1 new failures Look for lines containing SQL&gt

Leo’s heart sank. In the world of Oracle ASM, "1 new failure" is rarely just one thing; it's the tip of an iceberg.

The Investigation BeginsHe dove into the alert logs. Just seconds before the health checker tripped, he saw a flurry of ORA-15130 errors: diskgroup "DATA" is being dismounted. This was the DBA equivalent of a ship taking on water.

He checked the shared storage. "It's always the hardware," he muttered. But the storage arrays looked green. He then checked the ASM Filter Driver, remembering a bug involving 4k sector drives that had caused similar headaches for peers in the past. The DiscoveryLeo ran a quick check of the diskgroup status: Diskgroup: DATA Status: DISMOUNTED Cause: "Insufficient number of disks discovered".

It turned out a routine disk add operation from earlier that morning had gone sideways. A subtle corruption on metadata block 40 had been lying in wait. When the ASM rebalance operation hit that specific block, the Health Checker—a silent guardian that usually stays in the background—spotted the anomaly and pulled the emergency brake to prevent further data loss.

The ResolutionThe "1 new failure" wasn't a death sentence, but it required surgery. Leo had to:

Part 3: Immediate Diagnosis (Step-by-Step)

When you see the alert, do not panic. Follow this systematic diagnostic procedure:

Check permissions

stat /dev/mapper/asm_data2

Scenario B: Path Permissions

Error example: Disk path /dev/sdg has invalid permissions (640)

Fix:

chown grid:asmadmin /dev/sdg
chmod 660 /dev/sdg
-- If persistent across reboot, fix udev rules:
vi /etc/udev/rules.d/99-oracle-asm.rules
-- Add: KERNEL=="sdg", OWNER="grid", GROUP="asmadmin", MODE="0660"

Then restart the ASM instance or reload udev:

/sbin/udevadm trigger --subsystem-match=block

Preventive Measures

Regular Monitoring: Regularly check ASM and database performance.
Maintain Adequate Space: Ensure disk groups have enough free space for growth.
Use Redundancy: Use higher redundancy levels for critical data.
Keep Software Updated: Keep your Oracle database and ASM software up to date.

Step 1: Locate the Full Failure Details

The message asm health checker found 1 new failures is just the headline. The details are written to the ASM alert log.

# As grid user
cd $GRID_HOME/log/<hostname>/alert
tail -100 alert_+ASM1.log | grep -i "health check"

Look for lines containing SQL> select * from v$asm_health_check or health_check_summary. You will see a failure line like:
"Check: Disk Path Accessibility, Status: FAIL, Details: Disk DATA_0002 path /dev/mapper/asm_data2 is not readable"