Saturday, December 22, 2012

Tis the Season for Sys Admin

A couple of weeks ago Patch Tuesday came up with a bunch of updates for my Desktop PC, which seemed to be an update too far.  One update failed and I also got the Blue Screen of Death (BSOD), not once or twice but repeatedly. Now I tend to use my Desktop computer as the centre of the universe my computing, email, work, blogging, photography, records and as a hobby. The latter point means that it is not quite the computer is was when I first bought it.

The of the patch I couldn’t get loaded was rather oddly: KB2779562 – which is a time zone update! Sill the machine was working and I thought no more until a day or so later I noticed that the computer had logged me out, or so I thought.  I logged in again, there was no message so I wondered if some piece of software had updated and forced a re-boot. Then as I was doing something I got a BSOD. It re-booted, I logged on and a short while later a BSOD.

It got to the point when I began to wonder whether it might be an underlying hardware fault rather than a software problem. I still reckoned it was the software, but after a re-boot I could log in and then as all the stuff initialised it would crash – with a Blue Screen and then do a crash dump and re-boot.

At some time in my misspent youth I was responsible for the Sys admin for a bunch of Unix computers. This was in the day when they cost tens of thousands of pounds (£) and it was necessary to have a service contract as they went wrong fairly very frequently. One of them was the size of a washing machine with massive circuit boards and fixing sometimes seemed to entail randomly swapping boards until a combination that worked was found. Indeed such was the regularity of problems I used to keep a written log of the problems (and so did the users) to aid with fixing them.

Fortunately modern electronics has become very much more reliable, although not infallible. The only trouble is that the software and hardware has also become a lot more complex. So trying to diagnose problems can be tricky, in fact trying to find information on how to diagnose problems can be tricky.  There is an awful lot of advice along the lines of reboot you computer ending up with re-load the operating system. Which seems a pretty drastic step.

So my my plan was first to understand what might be happening and get the computer stable, then to do some backups and then to run various diagnostics. I was also wondering whether I would have to replace the computer. It is around 5 or so years old and is both a hobby computer, which means I play with it and change cards and software on it regularly as well as a work and home computer.I did wonder whether to separate the hobby from the rest – but then I would have yet another computer to look after.

So after re-booting in safe mode I had a “play”. It turns out that my anti-virus software was updated on the day it started BSODing with a vengeance.  I also found some information about BSOD codes along with some advice on fixing things along with this link from Microsoft – Resolving stop (blue screen) errors in Windows 7. Unfortunately my error was:

STOP Error 0x0000007F: UNEXPECTED_KERNEL_MODE_TRAP

STOP error 0x7F means that the Intel CPU generated a trap and the kernel failed to catch this trap. STOP code 0x0000007F may also display "UNEXPECTED_KERNEL_MODE_TRAP" on the same STOP message.

Which points to either a hardware error or a double fault. So despite my hoping it was software the case against the hardware was looking stronger. Even more so when I found this page: General causes of “STOP 0x0000007F” errors.  At this point I decided to read this link – lots of helpful stuff on BSODs

Still in safe mode using the System restore facility I decided to restore to the point before the Virus Checker update. The good news was that in safe mode the computer seemed stable – co-incidentally the Virus checker was not running. So I started the system restore, this took ages several hours, to the point I began to winder if the computer had crashed without telling me when it asked for a reboot. I did and was told that it couldn’t do a System restore.

So then I did a bit of checking on System restore issues here and here. As the machine had re-booted in normal mode I was surprised but relieved that it had stopped crashing. So at this point I figured I’d better do some backups. The first step was to organise the various USB disks I have got plugged into my machine, The computer has 2Tb of RAID organised as a RAID 1 in which there are actually 2 x 2Tb disks and one mirrors the other.

The trouble is my external drives come in a variety of sizes. The desktop holds around 1Tb of info with just under 400Gb of music and picture information. So using SyncToy, some Microsoft freeware I created a copy of the music and picture data on one USB drive and then my work directories and email on another USB drive. SyncToy is basically a super copier and is straightforward, fast and shows you what it is doing. However with the relatively large amounts of data it still took a fair bit of time. (Around 4 hours – for the picture and music stuff).

The next step was to run a complete system back up, I used Acronis software (2010) – this is slightly less transparent than SyncToy but in theory enables a more complete backup of the system stuff as well. I say in theory because unless you have had to retrieve data from a backup you can’t really be sure how good it is -  or indeed whether it works at all.

This backup took hours, around 8 hours. They do say that unless you don’t have x copies off is data your data you don’t really care about it – where x can be 1, 2 or 3. In this case with a machine that had become suspect I figured the more the merrier. So I also cloned my system disk. This is something I have used Acronis for in the past and been very impressed with its capability. I converted my computer from a a RAID 0 configuration to a RAID 1 using the cloning facility. (In RAID 0 the data is spread across two drives for speed.)

So I knew it would work and it would give me a disk from which I could boot up the system as if it were the original. The only trouble is that each level of additional complexity takes even longer. The ghost software runs after a re-boot before handing control back to windows – this took around 12 or so hours. (Note I only needed a spare 2Tb disk as my computer will reboot from a single disk if two aren’t present.)

The good news was that my Desktop seemed to have gotten over its crashing habit so the next step was to see what I could do about System Restore – in the end give the number of backups I turned it of and on – this apparently deletes all of the check points and then you start again. One of the problems can be a previous checkpoint has been corrupted. The trouble is ‘cos I have quite a lot of data even these sorts of take seem to take ages. As you wait you tend to think the worst and assume something has crashed. In my experience a crash when doing deeper system stuff always creates trouble. It all held up and I created a check point. Now for completeness I should have then checked to see if I could properly go back to that restore point – but life is too short.

Then I ran a System File Checker – which promised to take a long time, but was over quicker than I had expected and slightly surprisingly reported no problems. The next stage of my plan was to “repair the Windows 7 Installation”. it is billed as a last resort – so in the end I didn’t do it. I am thinking about doing a Disk check though.

I have been thinking about loading Windows 8 onto my computer – but if the hardware is flaky then it would me more hassle and if I have to buy another machine then I can dream about multiple screens and flash drives and loads of ram. still here I am a week later and things are holding up so back to Blogging I go. But I will up the frequency of my back ups for a while though.

Sorry no pictures – but there will be some in the next post.

No comments:

Post a Comment