The Automatic Storage Management (ASM) health checker is a crucial tool in Oracle databases that monitors the health and integrity of the storage infrastructure. When the ASM health checker reports a new failure, it's essential to understand the implications and take corrective actions to prevent data loss or system downtime. In this blog post, we'll discuss what an ASM health checker failure means, how to investigate the issue, and steps to resolve it.
ALTER DISKGROUP DATA SET ATTRIBUTE 'compatible.asm' = '12.2'; asm health checker found 1 new failures
: The most frequent cause is that one or more disks in a group are no longer reachable due to hardware failure, storage connectivity issues, or OS-level changes. The Automatic Storage Management (ASM) health checker is
If the health checker found a disk failure, check the OS-level connectivity. lsdsk (within ASMCMD) or fdisk -l (Linux). ALTER DISKGROUP DATA SET ATTRIBUTE 'compatible
To understand the gravity of this alert, one must dissect what ASM protects. ASM is not merely a volume manager; it is the nervous system of an Oracle database environment, striping and mirroring data across physical disks. A failure here is not isolated. The one failure could be a physical disk beginning to show sector reallocation counts, an offline ASM disk that has exhausted its repair timer, or a consistency issue in the disk group’s metadata. In a normal redundancy configuration with two failure groups, the loss of one disk is survivable. But if that “one new failure” is the prelude to a second—say, a controller failure on the partner disk—the entire disk group could dismount, bringing critical databases to an abrupt halt. Thus, the health checker’s finding is a warning that the margin of safety has just narrowed.