Quantcast
Channel: Embracing Chaos » Geek
Viewing all articles
Browse latest Browse all 40

Macbook Crashes, Kernel Panics and coping with an Apple “Genius”

$
0
0

So your Mac is crashing a lot, and after a trip to the “Genius Bar”, you’re starting to think maybe that “genius” you talked to is anything but.  Is this where you are?  If so, join the club, because that’s exactly what I’ve been going through recently.  My MacBook Pro would regularly go black without warning, and the only way I could get its attention again was to hold the power button for ten seconds.  Often it crashed while the screen saver was running, or when I was switching between desktop Spaces, or any other time.  And it was a thorough and complete crash — no warning, no recovery.

It was quite a chore to get Apple to admit that the cause was a hardware problem, and fix it.  But I finally succeeded, so I thought I’d share some of my experiences.  I’ll explain what a Kernel Panic is, how they sometimes can be caused by faulty software but often indicate hardware problems, how they differ from other kinds of crashes, and provide a guide on how to read a Mac OS X kernel panic report.

Dealing with the “Genius” Bar staff

“Genius” is what Apple calls its first tier of technical support.  I find the brand unfortunate and insulting for everybody involved.  There is no intelligence test required to work as a “genius” — just some minimal training on how to follow Apple customer service scripts like an obedient robot.  Knowing Apple, I wouldn’t be surprised if the “Genius” staff are required to follow these scripts verbatim and face not only termination but punitive lawsuits for deviating from the party line.  Keep this in mind when dealing with them.  Also know that they have some discretion in the outcome of your visit, but the discretion exists within guidelines that they cannot control.

Some tips on getting past the “genius” from my limited experience.  Print out your kernel panic reports and bring them in.  The more the better.  Highlight the relevant parts.  I’m not sure if bringing a bad attitude with you helps or not — they want to make their customers happy, but they don’t like their “genius” title challenged with logic.  I also recommend persistence.  Following their stupid advice and showing them that it did no good will help.  I’m not sure if understanding what’s going on will or not.  But if you’d like to understand more about why your Mac is crashing, read on…

Kernel panics and hardware failures vs regular software failures

There are two basic ways your Mac can crash.  First, an application might lock up on you and become unresponsive.  You get the spinning beachball of death, and eventually have to Force Quit your application, losing whatever work you hadn’t saved.  This kind of user mode failure is very common with buggy software.  If the beachball is getting you down, the problem is almost certainly caused by bad software, not by a hardware problem.  In OS 9 and before, this kind of failure could have taken down your entire machine, but since the introduction of the BSD kernel in OS X, the system is designed to allow one application to fail while protecting all the other applications.

Sometimes though your entire Mac will crash hard.  Without warning your system displays a full-screen message saying “You need to restart your computer. Hold down the Power button for several seconds or press the Restart button.” in several languages.  This is OS X’s last ditch attempt to tell you something about what happened before it goes completely teets up.  It’s formally known as a kernel panic.  Sometimes the system is so screwed it can’t even get that error message onto the screen before it dies.

Kernel panics indicate a serious problem, either with the computer’s hardware, or the low-level software in the operating system. In fact there are only three things that can cause a kernel panic:

  1. Faulty hardware causes a problem that the OS doesn’t know how to deal with
  2. A bug in OS X itself
  3. A bug in an OS plugin called a kernel extension or kext

Firstly, if the hardware itself has problems, then kernel panics are a common way they manifest themselves.  Similarly, if the operating system itself has any bugs, they could take down the entire system.  The third option could be caused by third-party software, while the first two are entirely Apple’s responsibility.  So when it comes to dealing with the “Genius” behind the bar, the first two are fairly straightforward.  If you’re seeing this problem a lot, and nobody else is, then it’s probably a hardware problem, and they should replace your hardware. Here’s a thought experiment I tried unsuccessfully with the Apple “geniuses” I had to deal with: Imagine you have a hundred Macs all running the same software, and one of them crashes periodically, but the other 99 don’t.  Would you classify that Mac as having a hardware problem or a software problem?  In my case, the genius insisted that it was a software problem.  In fact he claimed he was certain that if I uninstalled Adobe Flash, the problem would be fixed.  Read on, and you’ll learn how the kernel panic reports themselves show that this explanation is impossible.

Understanding and interpreting Kernel Panic reports

First a bit about what a Kernel Panic is.  Very simply, it’s when something unexpected goes wrong in the operating system kernel.  What’s the kernel?  The kernel is the lowest level of the operating system — the part that’s closest to the hardware.  In modern operating systems, there’s a fairly arbitrary line between what functionality lives in the kernel and what functionality lives in the user space.  The key difference is that when something goes wrong with software in the user space, you get a beachball on the app, but the system survives.  When something goes wrong in the kernel, you get a kernel panic, and the whole system goes bye bye fast.  So it’s critical that any code running in the kernel space be ultra reliable.  You don’t change kernel code quickly or lightly, and you test the hell out of it before you release it.  But code runs faster in the kernel, so most modern operating systems put important things like networking and graphics into the kernel.  The BSD kernel which powers OS X allows the installation of “kernel extensions” or “kexts” which add functionality.  More about these soon.  But suffice to say that when anything goes wrong with any kext, it’s a big deal problem because there’s nothing to fall back on (e.g. can’t display an error dialog if the problem is with the display system), so the system’s reaction is called a panic.  Thus “kernel panic.”

Immediately after a KP, your computer does two things: it stores a bunch of information to help diagnose what caused the problem, and puts up the error screen, if it can.  When you reboot, your computer asks if you want to send the KP report to Apple.  You should do this.  The smarter of the “genius” staff can look these reports up and see that your Mac is actually crashing, but they’ll admit that the contents are too technical for a mere “genius” to understand.  Well I’m going to explain to you what the reports contain and what it means about what’s wrong with your computer.

Here’s a typical crash report from my computer.  In my case, these panics weren’t even accompanied by the “restart your computer message” because as I’ll explain, the problem originated in the graphics system.  My computer just suddenly went black and non-responsive.  I’ve highlighted a few key sections for explanation below.

Interval Since Last Panic Report:  420 sec
Panics Since Last Report:          1
Anonymous UUID:                    8A09F455-1039-4696-8479-xxxxxxxxxxxx
Thu Apr 21 09:00:51 2011
panic(cpu 3 caller 0x9cdc8f): NVRM[0/1:0:0]: Read Error 0x00000100: CFG 0xffffffff 0xffffffff 0xffffffff, BAR0 0xc0000000 0xa734e000 0x0a5480a2, D0, P2/4
Backtrace (CPU 3), Frame : Return Address (4 potential args on stack)
0xbc001728 : 0x21b510 (0x5d9514 0xbc00175c 0x223978 0x0)
0xbc001778 : 0x9cdc8f (0xbe323c 0xc53840 0xbf23cc 0x0)
0xbc001818 : 0xae85d3 (0xe0cfc04 0xe5c9004 0x100 0xb83de000)
0xbc001868 : 0xadf5cc (0xe5c9004 0x100 0xbc001898 0x9bd76c)
0xbc001898 : 0x16c8965 (0xe5c9004 0x100 0x438004ee 0x28)
0xbc0019d8 : 0xb07250 (0xe5c9004 0xe5ca004 0x0 0x0)
0xbc001a18 : 0x9d6e23 (0xe5c9004 0xe5ca004 0x0 0x0)
0xbc001ab8 : 0x9d3502 (0x0 0x9 0x0 0x0)
0xbc001c68 : 0x9d4aa0 (0x0 0x600d600d 0x704a 0xbc001c98)
0xbc001d38 : 0xc89217 (0xbc001d58 0x0 0x98 0x2a358d)
0xbc001df8 : 0xc8ec1d (0xe8e5404 0x0 0x98 0x45e8d022)
0xbc001f18 : 0xc8f0b4 (0xe8e5404 0x124b6204 0x6d39d1c0 0x0)
0xbc001f78 : 0xc8f39f (0xe8e5404 0x124b6204 0x6d39d1c0 0xbc0021e0)
0xbc002028 : 0xca3691 (0xe8e5404 0x1f80d8e8 0xbc00239c 0xbc0021e0)
0xbc002298 : 0xc84d09 (0x6d0b7000 0x1f80d8e8 0xbc00239c 0x0)
0xbc0023f8 : 0xc84f47 (0x6d0c6000 0x1f80d800 0x1 0x0)
0xbc002428 : 0xc87a04 (0x6d0c6000 0x1f80d800 0x0 0x97c6c4fc)
0xbc002468 : 0xca9d40 (0x6d0c6000 0x1f80d800 0x6d09f274 0x140)
0xbc0024f8 : 0xc9b5a9 (0xde94bc0 0x1f80d800 0x0 0x1)
0xbc002558 : 0xc9b810 (0x6d09f000 0x6d09f77c 0x1f80d800 0x0)
0xbc0025a8 : 0xc9bce4 (0x6d09f000 0x6d09f77c 0xbc0028cc 0xbc00286c)
0xbc0028e8 : 0xc98aaf (0x6d09f000 0x6d09f77c 0x1 0x0)
0xbc002908 : 0xc605a1 (0x6d09f000 0x6d09f77c 0x1956a580 0x0)
0xbc002938 : 0xc9a572 (0x6d09f000 0xbc002a7c 0xbc002968 0x5046b1)
0xbc002978 : 0xc648de (0x6d09f000 0xbc002a7c 0x0 0xc000401)
0xbc002ab8 : 0xc9dee6 (0x6d09f000 0x0 0xbc002bcc 0xbc002bc8)
0xbc002b68 : 0xc60c93 (0x6d09f000 0x0 0xbc002bcc 0xbc002bc8)
0xbc002be8 : 0x56a738 (0x6d09f000 0x0 0xbc002e3c 0xbc002c74)
0xbc002c38 : 0x56afd7 (0xcef020 0x6d09f000 0x129bab88 0x1)
0xbc002c88 : 0x56b88b (0x6d09f000 0x10 0xbc002cd0 0x0)
0xbc002da8 : 0x285be0 (0x6d09f000 0x10 0x129bab88 0x1)
0xbc003e58 : 0x21d8be (0x129bab60 0x1ec235a0 0x1fd7e8 0x5f43)
      Backtrace continues...

      Kernel Extensions in backtrace (with dependencies):
         com.apple.GeForce(6.2.6)@0xc55000->0xd0afff
            dependency: com.apple.NVDAResman(6.2.6)@0x967000
            dependency: com.apple.iokit.IONDRVSupport(2.2)@0x95a000
            dependency: com.apple.iokit.IOPCIFamily(2.6)@0x927000
            dependency: com.apple.iokit.IOGraphicsFamily(2.2)@0x938000
         com.apple.nvidia.nv50hal(6.2.6)@0x1592000->0x19a6fff
            dependency: com.apple.NVDAResman(6.2.6)@0x967000
         com.apple.NVDAResman(6.2.6)@0x967000->0xc54fff
            dependency: com.apple.iokit.IOPCIFamily(2.6)@0x927000
            dependency: com.apple.iokit.IONDRVSupport(2.2)@0x95a000
            dependency: com.apple.iokit.IOGraphicsFamily(2.2)@0x938000

BSD process name corresponding to current thread: kernel_task

Mac OS version:
10J869
Kernel version:
Darwin Kernel Version 10.7.0: Sat Jan 29 15:17:16 PST 2011; root:xnu-1504.9.37~1/RELEASE_I386
System model name: MacBookPro6,2 (Mac-F22586C8)
System uptime in nanoseconds: 35829130822125

unloaded kexts:
com.apple.filesystems.msdosfs 1.6.3 (addr 0xbc1e5000, size 0x53248) - last unloaded 12216461868115

loaded kexts:
com.parallels.kext.prl_vnic 6.0 11992.625164
com.parallels.kext.prl_netbridge 6.0 11992.625164
com.parallels.kext.prl_usb_connect 6.0 11992.625164
com.parallels.kext.prl_hid_hook 6.0 11992.625164
com.parallels.kext.prl_hypervisor 6.0 11992.625164
com.apple.filesystems.smbfs 1.6.6 - last loaded 12151022138289
com.apple.driver.AppleHWSensor 1.9.3d0
com.apple.driver.AGPM 100.12.19
com.apple.driver.AppleMikeyHIDDriver 1.2.0
com.apple.driver.AppleHDA 1.9.9f12
com.apple.driver.AppleUpstreamUserClient 3.5.4
com.apple.driver.AppleMCCSControl 1.0.17
com.apple.driver.AppleMikeyDriver 1.9.9f12
com.apple.driver.AudioAUUC 1.54
com.apple.driver.AppleIntelHDGraphics 6.2.6
com.apple.driver.AppleIntelHDGraphicsFB 6.2.6
com.apple.driver.SMCMotionSensor 3.0.0d4
com.apple.kext.AppleSMCLMU 1.5.0d3
com.apple.Dont_Steal_Mac_OS_X 7.0.0
com.apple.iokit.CHUDUtils 201
com.apple.iokit.CHUDProf 216
com.apple.driver.AudioIPCDriver 1.1.6
com.apple.driver.AppleGraphicsControl 2.8.68
com.apple.driver.ACPI_SMC_PlatformPlugin 4.5.0d5
com.apple.GeForce 6.2.6
com.apple.driver.AppleLPC 1.4.12
com.apple.filesystems.autofs 2.1.0
com.apple.driver.AppleUSBTCButtons 200.3.2
com.apple.driver.AppleUSBTCKeyboard 200.3.2
com.apple.driver.AppleIRController 303.8
com.apple.driver.AppleUSBCardReader 2.5.8
com.apple.iokit.SCSITaskUserClient 2.6.5
com.apple.BootCache 31
com.apple.AppleFSCompression.AppleFSCompressionTypeZlib 1.0.0d1
com.apple.iokit.IOAHCIBlockStorage 1.6.3
com.apple.driver.AppleUSBHub 4.1.7
com.apple.driver.AppleFWOHCI 4.7.1
com.apple.driver.AirPortBrcm43224 427.36.9
com.apple.iokit.AppleBCM5701Ethernet 2.3.9b6
com.apple.driver.AppleEFINVRAM 1.4.0
com.apple.driver.AppleSmartBatteryManager 160.0.0
com.apple.driver.AppleUSBEHCI 4.1.8
com.apple.driver.AppleAHCIPort 2.1.5
com.apple.driver.AppleACPIButtons 1.3.5
com.apple.driver.AppleRTC 1.3.1
com.apple.driver.AppleHPET 1.5
com.apple.driver.AppleSMBIOS 1.6
com.apple.driver.AppleACPIEC 1.3.5
com.apple.driver.AppleAPIC 1.4
com.apple.driver.AppleIntelCPUPowerManagementClient 105.13.0
com.apple.security.sandbox 1
com.apple.security.quarantine 0
com.apple.nke.applicationfirewall 2.1.11
com.apple.driver.AppleIntelCPUPowerManagement 105.13.0
com.apple.driver.DspFuncLib 1.9.9f12
com.apple.driver.AppleProfileReadCounterAction 17
com.apple.driver.AppleProfileTimestampAction 10
com.apple.driver.AppleProfileThreadInfoAction 14
com.apple.driver.AppleProfileRegisterStateAction 10
com.apple.driver.AppleProfileKEventAction 10
com.apple.driver.AppleProfileCallstackAction 20
com.apple.driver.AppleSMBusController 1.0.8d0
com.apple.iokit.IOFireWireIP 2.0.3
com.apple.iokit.IOSurface 74.2
com.apple.iokit.IOBluetoothSerialManager 2.4.0f1
com.apple.iokit.IOSerialFamily 10.0.3
com.apple.iokit.CHUDKernLib 208
com.apple.iokit.IOAudioFamily 1.8.0fc1
com.apple.kext.OSvKernDSPLib 1.3
com.apple.driver.AppleHDAController 1.9.9f12
com.apple.iokit.IOHDAFamily 1.9.9f12
com.apple.iokit.AppleProfileFamily 41
com.apple.driver.AppleSMC 3.1.0d3
com.apple.driver.IOPlatformPluginFamily 4.5.0d5
com.apple.driver.AppleSMBusPCI 1.0.8d0
com.apple.nvidia.nv50hal 6.2.6
com.apple.NVDAResman 6.2.6
com.apple.iokit.IONDRVSupport 2.2
com.apple.iokit.IOGraphicsFamily 2.2
com.apple.driver.BroadcomUSBBluetoothHCIController 2.4.0f1
com.apple.driver.AppleUSBBluetoothHCIController 2.4.0f1
com.apple.iokit.IOBluetoothFamily 2.4.0f1
com.apple.driver.AppleUSBMultitouch 206.6
com.apple.iokit.IOUSBHIDDriver 4.1.5
com.apple.iokit.IOSCSIBlockCommandsDevice 2.6.5
com.apple.iokit.IOUSBMassStorageClass 2.6.5
com.apple.driver.AppleUSBMergeNub 4.1.8
com.apple.driver.AppleUSBComposite 3.9.0
com.apple.iokit.IOSCSIMultimediaCommandsDevice 2.6.5
com.apple.iokit.IOBDStorageFamily 1.6
com.apple.iokit.IODVDStorageFamily 1.6
com.apple.iokit.IOCDStorageFamily 1.6
com.apple.driver.XsanFilter 402.1
com.apple.iokit.IOAHCISerialATAPI 1.2.5
com.apple.iokit.IOSCSIArchitectureModelFamily 2.6.5
com.apple.iokit.IOUSBUserClient 4.1.5
com.apple.iokit.IOFireWireFamily 4.2.6
com.apple.iokit.IO80211Family 314.1.1
com.apple.iokit.IONetworkingFamily 1.10
com.apple.iokit.IOUSBFamily 4.1.8
com.apple.iokit.IOAHCIFamily 2.0.4
com.apple.driver.AppleEFIRuntime 1.4.0
com.apple.iokit.IOHIDFamily 1.6.5
com.apple.iokit.IOSMBusFamily 1.1
com.apple.kext.AppleMatch 1.0.0d1
com.apple.security.TMSafetyNet 6
com.apple.driver.DiskImages 289
com.apple.iokit.IOStorageFamily 1.6.2
com.apple.driver.AppleACPIPlatform 1.3.5
com.apple.iokit.IOPCIFamily 2.6
com.apple.iokit.IOACPIFamily 1.3.0

The first line is fairly clear — how long has your system been running since its last crash?  If this is less than an hour, as it was for my computer, then your machine is completely FUBAR.  Less than a day and you’ve still got a seriously unstable computer.  (Hint for any “genius” that might be reading this article: take the number of seconds, divide it by 60 using the Calculator app on your store-issued-iPad, and that will give you the number of minutes.  Divide that new smaller number by 60 again to get an even smaller number which is hours.  If you can figure out how to get to number of days by yourself, it’s time to apply for the “Genius Lead” job.)

The Anonymous UUID is an effectively random code that allows Apple to lookup the crash reports for your computer when you go into the store.  Then there’s the date.  Straightforward.

The line which starts “panic” is the closest thing you’ll find to a concise explanation of what went wrong. In all likelihood this will be a jumble of words and numbers that make no sense, but it’s a great string to Google.  If you’re having a hardware problem, this message will probably stay about the same with each KP.  Googling my error message “NVRM[0/1:0:0]: Read Error 0x00000100” turns up a bunch of people with similar problems — computer going black without warning, often while playing World of Warcraft.

The next section titled “backtrace” is worthless unless you’re actually diving into the source code that caused the problem.  Skip over it.  But the section after it is extremely interesting and relatively easy to interpret.

The section titled “Kernel Extensions in backtrace (with dependencies)” actually tells you what part of the system failed.  Read this one closely and try to make sense of it. In the case of my example, there are three kernel extensions involved with the crash.  They are called “com.apple.GeForce” and “com.apple.nvidia.nv50hal” and “com.apple.NVDAResman”.  The first one is fairly obvious — GeForce is the kind of graphics chip in the macbook.  The second one is also pretty clear — NVidia is the company that makes GeForce, and nv50hal I would guess means “NVidia 5.0 Hardware Abstraction Layer” or something similar.  I’m not sure what NVDAResman is but looking down a bit I see it’s related to “IOGraphicsFamily”.  This paints a really clear picture that the failure is in the graphics system.  Moreover, since every line here starts with “com.apple” we know the failure is entirely in code written by Apple.  There is no third-party software involved in this crash.

For my particular crash, it’s important to know something about the graphics hardware of these MacBooks, since all evidence points to the graphics hardware.  This generation of macbooks have two graphics chips — a faster one from Nvidia, and a more battery-friendly one from Intel.  The nvidia chip which is apparently having problems is always used when the computer has an external monitor plugged in, or when something fancy is happening on the built-in screen.  A nice utility called gfxCardStatus can help you understand this complexity, and will definitely give you a leg up on the “genius.”

The following line starting with “BSD process name” can also be important.  This will sometimes tell you which user-level app originated the call into the kernel which failed.  In my case it was “kernel_task” which provides no additional information.

The next section gives some basic info about the Mac — hardware and OS versions.  What follows is a complete list of kernel extensions (kexts) installed.  This gives you a bit more ammo in dealing with the “genius” who is probably ignoring you at this point anyway.  You can look through this list and see everything that might possibly contribute to a kernel panic.  In my case, the only software modules that aren’t from Apple are some drivers from Parallels for running my Windows virtual machine.  So the only reasons my Mac might kernel panic are because of a hardware problem, a bug in OS X itself, or something going wrong with Parallels.  Understanding this should, in theory, be very helpful when talking to your local neighborhood “genius” but unfortunately they are simple bots that only run scripts authored in Cupertino and are not permitted to listen to logic.

Apple’s Propaganda about Flash

When the “genius” told me my Mac’s problem was that I had Adobe Flash installed, I just laughed at first.  Flash is installed on something like 97% of desktop computers, and very few of them regularly turn themselves off for no reason.   Moreover, the kernel panic report lists every piece of software that could possibly contribute to the kernel panic, and neither the word “flash” nor “adobe” appear anywhere in the list.  But then I realized he wasn’t joking.

Apple’s ongoing arguments with Adobe over Flash are well publicized.  The root of the issue, in very brief summary, is that Apple sees Adobe’s Flash as a strategic threat to their incredibly profitable iPhone platform.  The poor “genius” I’m stuck with has become a pawn in Apple’s PR battle, throwing himself on the grenade of propaganda just to spread FUD about Flash.  I tried reasoning with him, explaining that Adobe’s software doesn’t run in the kernel, and therefore cannot cause a kernel panic.  The job of the kernel is to protect users from badly written software crashing the whole machine. But he would not budge.  I imagined a “genius” script which read as follows:

Mac is crashing…

1. Run hardware diagnostic tests.

2. Address any identified hardware problems.

3. If hardware tests come back clean, tell customer that the problem (whatever it is) is caused by Flash.  Tell them to uninstall it, and see if that helps.

Here I imagine the Dantesque trap of the rare “genius” who actually understands how OS X works.  I’m telling the customer something which is impossible on its face, and he knows it.  He’s arguing with me telling me I’m being stupid.  But I signed a contract with Apple saying I would defame Adobe, and deviation from this contract will bring the wrath of Steve’s legal team on me.  I just have to smile and say things like “yeah, that’s the really strange thing about this particular software problem — it only affects certain computers.  But it’s definitely caused by Flash.”

One might reason that Flash could cause kernel panics because it makes more extensive use of the graphics system than other applications.  But in this case, Flash isn’t the actual problem.  Flash is exposing the underlying problem, as would any software which works the graphics system hard.  Thus lots of people with the same problem as me who play World of Warcraft.  If the “genius” advice ever works, it’s just because Flash is the most graphics intensive software that many people use on their Macs.  The actual problem is still either a bug in OS X, or a hardware problem.

Consider the advice not to use Flash on your Mac in analogy to a car.  (A high-end MacBook actually costs as much as some cars.)  Imagine that your car sometimes just turned its engine off while you were in the middle of driving it – catastrophic failure with no warning or apparent reason.  You go to the dealership and they can’t find anything wrong with it, but ask if you ever listen to electronic music?  Well, yes, sometimes.  That’s the problem!  It’s the electronic music which is causing your car to malfunction.  So stop listening to it, and the problem will be fixed.  Umm, what?  The closest thing to the truth, by analogy, would be that any bass-heavy music (graphics-intensive application) is stressing out some weak connection in the electronics.  But because the car dealership is owned by the local philharmonic, they’re blaming it on that awful music the kids listen to.   Using your misfortune and their incompetence to push an unrelated political agenda.

It’s an interesting glimpse into how Apple is using their retail presence to advance a strategic PR goal.  Evidence that Apple has grown up as a company to the point where their own motives are more important than doing what actually helps customers.  *sigh*  At least I got my MacBook fixed.


Viewing all articles
Browse latest Browse all 40

Trending Articles