Tuesday 25 June 2013

Windows 7 Ultimate 64bit - Network stack Crash and Burn

Short version:

If your Windows 7 crashes with services refusing to start and your network card or network connectivity is doing weird stuff, won't reinstall drivers, complaining about arb crap like "the parameter is invalid" and netsh gives you the 'initiating helper dll' finger every time you try reset TCP or Winsock, AND the system file checker says everything is fine and importing new Winsock and Winsock2 registry entries only sort-of fixes some of your issues but you get the feeling it's kinda like putting a plaster on a severed limb and you REALLY REALLY REALLY do NOT want to have to reinstall a FRESH copy of Windows and all the application reinstall and system configuration hell that goes with it - let me save you some trouble and offer three words of condolence;


In. Place. Upgrade.
Stop trying to fix it.  You can't.  Ok, I can't, but I live according to my ego which says if I can't, you can't.
So stop - relax - and reinstall your current version of windows over your current version of windows as an 'upgrade' - not a fresh install.
Not ideal.
No "Root Cause Analysis" resolutions or epiphanies about what cause it.
But if none of the google-fixes mentioned above work, just do it.  

Of course, this may not be easy, but it eventually worked for me.

Long version:

On my main PC I run Windows 7 Ultimate 64bit.  
No clue why I'm running 64bit as I only have 3GB RAM, but I run a little web dev with MySQL DB and PHP on IIS plus some other interesting combo's, suffice it say the system config is not something I really want to have to redo at any point.

The other day I was messing around and things just seemed sluggish.  I noticed the committed memory was way over what was currently in use and figured the poor thing could use a reboot. (It usually gets it's weekly reboot from either the power company blacking out or the maid running the dish washer, washing machine, tumble drier, kettle and iron at the same time, which generally results in a nice little 'click' as the DB boards melt and my computers cry in vain for a UPS before dwindling into nothingness...fortunately the power company hasn't blacked us out in a while and unfortunately we no longer have a maid, so, for a change, graceful shutdown and reboot in order!)

After the reboot, I log in and receive a funny error message saying that a system service didn't start.  It was funny because it said the service that didn't start was the Event log service, and I should go check out the Event log to see why....(nice one, William)


Anyway, before dissolving into a panicky mess I figured perhaps I should just reboot again, because I had seen a java-update pop up just before I rebooted and had to cancel the process from the shutdown menu.  (Okay, so it wasn't entirely graceful)

Ground 0:
Upon reboot system goes into a boot loop - windows logo, reboot.  Windows logo, reboot.  Starting Windows, repair mode!
What?
Um, ok.
I let it do it's thing (something along the lines of "Window is automatically fixing startup errors.")  
It reboots.
Repair mode.
Then it kindly tells me it couldn't automatically repair anything, and offers me some of my own choices - like a system restore point.
At this point I'm thinking, rather uncharacteristically "Hell yeah!  System restore point!  Go windows!" - mainly because back in the day I'd be pulling out the 'Windows XP boot USB Flash Drive' with fdisk and whatnot installed for your pleasure.
I select the last restore point (of about three that I can see), which I notice was just prior to a 'critical system update'.   The system works for a while and then reboots.
Repair mode.  
Weird...maybe I should have chosen another restore point.  Go back to restore point menu and....there are no restore points! (As an aside, I found out LATER that windows applies the FIFO, Fit In or F**k Off rule...wait...sorry, the First In First Out rule to restore points based on available space.  When you don't have it, it deletes the oldest restore point.  If you completely whack all available free space, say goodbye to your restore points.  As an aside, aside, still haven't figured out how to put system restores on another volume - will check it out after writing this...)

Level 1 - HP 0, INT 0, WIS 0, Magic - definitely 0:
After trying fitfully to use anything GUI-based to recover from the recovery menus, numbers of times, I eventually descend into console territory (Recovery Console / DOS Window).
A thought that had been sitting with me since watching that less than graceful shutdown surfaces - system file corruption somewhere.
I run the system file checker, which requires  the /offbootdir and /offwindir switches when running it from the repair console ("Offline mode" they call it).
So,  "SFC /offbootdir:c:\ /offwindir:c:\windows /scannow" scans the system files for corruption and presumably fixes them, assuming it finds them.
It reboots - didn't get a chance to see if it fixed anything or not, but I'm back in repair mode.
I try change my HDD BIOS setting from IDE to AHPI, (which was stupid because I never changed it FROM AHCI to IDE and Windows was installed with it on IDE, so obviously this didn't work).
I sheepishly put it back to IDE, and decide to try safe mode with boot logging.
System hangs for a bit after classpnp.sys then reboots.  I go online and realise how many people out there don't realise that when bootlog says "loaded classpnp.sys" and then hangs - this isn't a problem with classpnp.sys but whatever comes after it.  After boggling at people who can't understand why copying classpnp.sys over itself doesn't solve anything, I move on...
I SOMEHOW manage to get into safe mode (don't ask - I may have just selected safe mode...one second I was looking through tear-filled eyes at a repair menu, the next I was in safe mode)
I frantically disable all my non-windows startup services ("clean boot") - nada.
I run SFC in safe mode again just for kicks.
Reboot again - wait a second - I'm back in windows!  Fantastic!  Awesome!  Except when I log in I get the stupid Event log service message again....and my network isn't working....and when I look at running services, a whole tonne of 'Automatic' services haven't started, including DHCP client and Server which are in "starting" mode...so, awesome to have Windows back - not so awesome to have it buggered.

Level 10:  HP 5, INT 9, WIS 9, Magic...still 0:
Through some fiddling I somehow decided that based on the hierarchy of dependency services that are failing to start, it all looks rooted in the network stack.
I do a 
'netsh reset winsock'
and 
'netsh int ip reset' from a command prompt to try and reset the stack - I'm already logged in as Administrator so I don't have to mess about with "Run As Administrator", but still I'm not winning.
With both commands I get a weird "Initialization Function InitHelperDll in NSHTTP.DLL failed to start with error code 11003" before it said it was reset.
Reboot - no change.  Winsock resets still give the helperdll error.
After hours of fiddling I finally give in and try something that made very little sense to me - removing the winsock and winsock2 registry keys and replacing them with others exported from another copy of windows.
It didn't make sense, because I didn't HAVE another copy of Windows 7 Ultimate 64 - only Windows Professional on a laptop.  Plus the TCP service thingies enumerate in different orders in the LSP (Layered Service Protocol)....Nevertheless I was desperate, so I backed up the 
HKLM/System/CurrentControlSet/Services/Winsock
HKLM/System/CurrentControlSet/Services/Winsock2
entries and replaces them with the same keys exported from my laptop.

Reboot....

Level 20:  HP 14, INT 13, WIS 14, Magic...call it 5, but a very unimpressed 5:
Services start!
Well - some of them...the event log service for one - so now I can actually start looking at event logs (thanks William)
NIC is - just not starting.  I try updating/reinstalling the drivers - nada "The parameter is incorrect".
I try installing the vendor's specific driver (which is older than the MS one) - same problem.

I try uninstalling it.
Reboot - no more NIC.

Now the NIC shows up as not working in Device Manager, doesn't appear AT ALL in network management, and refuses to install anything driver-like.
I try flashing the BIOS (really desperate at this point). nada.
I try put in another NIC, but unfortunately the only one I could get my hands on was an OLD 3Com 3c905cx 10/100 PCI - which Windows 7 has unfortunately never heard of, cannot recognise, and won't install, no matter what drivers I try fiddle with.
Unfortunately, I'm looking at a rebuild....but I'm hoping an in-place upgrade will save the day.
More unfortunately, I don't have any OEM discs!  What the hell is this crap that vendors sell OEM windows without discs!?
Eventually I manage to get my hands on a Windows 7 Ultimate CD.  When I try upgrade however, it tells me I need to install Service Pack 1 before I can upgrade.
Not too convinced, I download and run SP1.  Sure enough, after hours of progress bars, it cranks and fails. I go through a whole tonne of "how to successfully install SP1" unsuccessfully.
Eventually I decide I'm going to dual-boot and just SEE if a fresh install will work, before I wipe my current version...

Level 35:  HP 17, INT 17, WIS 17, Magic....getting there with about a 12:
I run an install on D:\Windows and after a surprisingly short time I'm booting into a fresh install of Windows 7 ...and the NIC is FINE!  (if you hadn't been thinking NIC failure by now, well I was)...except I seem to have installed 32bit....weird.  It's not like it asked my opinion on anything, it just installed - I thought it was native 64bit....oh well.
Interestingly enough, this annoyed me more than anything else.  Now I KNEW it was system file or registry corruption - so I bounced out of my fresh install and flew back into damage inc. determined to nail it.
I uninstalled iTunes and Bonjour (suspect just based on the MDNSResponder values it pops into the winsock registry keys - and removing it wasn't easy which made it more suspect!) - no change.
Kept reimporting my old keys now and again to check, and sure enough, every time I reimported my old winsock registry keys the system burst into flames with any number of issues.  Every time I used the imported Windows Professional winsock keys, I got a sort-of healthier startup, but no network and network-stack-dependent services still crashing.
I thought perhaps the windows install failed previously because it was 32bit (don't ask me why I thought it, it makes no sense because it complained about SP1 - not 64bit or anything...but there you go) so I went hunting for a confirmed 64bit version (boy was that an adventure reminiscent of captain jack sparrow) and eventually came up with something I was fairly sure was indeed the 64bit version
.

Level 50: HP 21, INT 25, WIS 20, MAG 18 - bring on the carrion crawler!
I run the 64bit setup (after clearing a few GB of space - you'd think the sudden lack of restore points would have been enough, but no!) and lo-and-behold it bitches about SP1.
Groaning, I prepare for another bout of attempting to install SP1, feeling pretty certain I'm headed for complete fresh reinstall and weeks of IIS config to work through.
But then I saw some arb comment on some arb forum that nobody even bothered to rate.
Chuckling evilly at the irony, I run setup again - this time in COMPATIBILITY MODE.
I tell compatibility mode this definitely used to run on an earlier version of windows, and that was VISTA SP2.  I Start the setup...
SETUP RUNS!  for two and a half days.... but it runs.
(By this time we'd gotten a new maid - yes it's been that long - and I'm praying the electricity company doesn't black out while I'm systematically severing every high-wattage appliance plug from its power cable.)
Eventually I get to log into windows again AND....

My network icon is still sporting a rather sad little red cross....but there were no service startup errors AT ALL...plus I notice I'm now running SP1...(Seriously William - WTF?)

Almost sobbing I double click the NIC to confirm the inevitable....but wait.  The NIC is visible, it's just disabled.  I remembered that somewhere in my troubleshooting, rebooting madness I had disabled the NIC to see if it made a difference.  It didn't, except for the ability for it to f**k with my emotions at THIS point!
Like a loyal owner stroking their rabid pet, (that is, very carefully while using a long stick and holding a 9mm behind your back) I right-click the NIC and click Enable...


IT'S ALIVE!


Words to the wise:

  • Make those repair discs.
  • Get Windows Backup Running.  Back that sucker up.
  • Create Restore Points and watch your available space.
Disclaimer:
All stolen images are copyright of their respective owners and I did not get permission to reproduce them.  However, considering I just did a search on Google Images and fiddled with photoshop I figured nobody would mind.  If you do mind, please talk to Google seeing as they were kind enough to reproduce them for me in the first place, so I'll take whatever legal stance they take.