Fairly robust Regular Expression to scrape an ISBN

I’m working on an app that given the URL for a book, needs to scrape the page looking for an ISBN. I have seen several of these out there, but I didn’t think they would be quite robust enough.

 

The main difficulty is the format – an ISBN can be 10 or 13 digits, optionally broken into sections separated by hyphens. (One could use spaces, but you have to draw the line somewhere…) The saving grace is that in every site I’ve sampled, (once the html tags are stripped out), the text “isbn” precedes the number itself. This should be the case even for simple tables (the two <td> elements are normally in the same row, thus consecutive). Then there is the hyphen issue. ISBN legitimately supports variable length groups.

So I have two (.NET style – remove the string “?<isbn>” from them to get a standard regex) Regular Expressions – one for ISBN-10 and one for ISBN-13.

Here they are defined in c#

            Regex rexIsbnNum10 = new Regex(@"isbn(.{0,3}10)?[^\w]{1,10}(?\d-?\d-?\d-?\d-?\d-?\d-?\d-?\d-?\d-?\d\b)");
            Regex rexIsbnNum13 = new Regex(@"isbn(.{0,3}13)?[^\w]{1,10}(?\d-?\d-?\d-?\d-?\d-?\d-?\d-?\d-?\d-?\d-?\d-?\d-?\d\b)");

To use them, you grab the contents of the web page, strip out the html tags, make it lowercase for simplicity (or use case-insensitive regexs), apply regexIsbnNum10.Matches() to the string, and you should have all the ISBN-10 values nicely enumerated. Likewise for ISBN-13.

 

Using the ISBN-13 as an example, this regex will match:

ISBN 978-0130190772
ISBN-13: 978-0130190772
ISBN13: 978-0130190772
ISBN: 978-0130190772
ISBN-13: 9-78-01301-9077-2
ISBN-13 — 9-7-8-0-1-3-0-1-9-0-7-7-2

It will not match if there are less than or greater than 13 numbers, or if the “ISBN” and the “13” and more than 3 character apart, or if the ISBN-13 is more than 10 characters away from the number itself (or letter / number falls in between)

April 10, 2012 Internet Explorer 9 Windows Update Issues

Around April 10, 2012 Microsoft updated the Internet Explorer 9 installers with a broken version. Until this issue is fixed, your safest bet is to use the older installer here, then update with Windows Update (just do not perform an initial install from Windows Update).

 

See the Microsoft Answers thread here: http://answers.microsoft.com/en-us/windows/forum/windows_other-windows_update/windows-server-2008-r2-sp-1-internet-explorer-9/ac2ed42f-7faf-4731-8af9-7a50d946138a
Jonathan Sahagun found a direct link to the previous, working version of IE9 here: http://www.microsoft.com/download/en/confirmation.aspx?id=23332

 

There are two solutions – probably the safest solution is to uninstall then reinstall an older version. The alternate solution is to make a couple registry changes.

 

Symptoms

After this bad version of IE9 is installed, you will be continuously prompted by Windows Update to install Internet Explorer 9 – it appears as though it cannot tell you already installed it.

image

If you attempt to install it again, the update will fail (with Error “Code 9C48  Windows Update Encountered an unknown error”), but it will continue to repeatedly offer the update:

image

 

Additionally, there is a conflict between the version embedded in the files, and the version reported by IE:

To see the version conflict, browse to C:\Program Files (x86)\Internet Explorer\iexplore.exe, view the properties, then the details tab, and note the Product version number:

image

In my case, it is 9.00.8112.16443 (file version is 9.0.8112.16443)

 

Now, within Internet Explorer, got to Tools –> About Internet Explorer

image

The version doesn’t match – 9.0.8112.16421
(Note: On my other system where IE9 was installed before April 10, this is the version number on both the files and in the program)

 

Solution #1 – Confirmed, Probably supported

If you haven’t already installed IE9, use the following link to install it: http://www.microsoft.com/download/en/confirmation.aspx?id=23332

If you already installed it: uninstall it, then reinstall using the linked version.

If you have installed other applications since installing IE9, this may break things. I have also seen the uninstall leave IE in an inconsistent state… so naturally I sought out an alternative.

 

Solution #2 – Quicker, safer, riskier

It looks like this version conflict is a registry setting:

image

In HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Internet Explorer  (and also HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Internet Explorer), svcVersion should match Version and W2kVersion. Change it to 9.0.8112.16443

 

And Viola, IE9 reports the correct version, and Windows Update stops offering the Installer.

image

 

Hopefully Microsoft will figure out the mistake soon (as of this writing it has nearly been a week) fix the installer, and fix Windows Update to recognize and remedy the broken installs.

How to uninstall MSDAIPP (Web Folders) from Windows 7

We had a Windows 7 workstation that was exhibiting “wonky” behavior when opening SharePoint folders in Office 2010 Word (and explorer in general). When it was supposed to open the save-as dialog to a SharePoint folder, it would instead show the default “My Document” folder. You could still enter a SharePoint url, and it would load correctly, but the initial dialog refused to open SharePoint.

Using Fiddler2 I was able see that the problem computer was using a different WebDAV client than the working workstations from the User-Agent in the requests. The first request is identical:

OPTIONS http://sharepointserver/doclib/ HTTP/1.1
User-Agent: Microsoft Office Protocol Discovery
Host: sharepointserver
Content-Length: 0
Connection: Keep-Alive
Pragma: no-cache

 

But the second request is all kinds of different.

Normal Workstation:

OPTIONS http://sharepointserver/ HTTP/1.1
User-Agent: Microsoft-WebDAV-MiniRedir/6.1.7601
translate: f
Connection: Keep-Alive
Host: sharepointserver

 

Problem Workstation:

OPTIONS http://sharepointserver/ HTTP/1.1
User-Agent:
Microsoft Data Access Internet Publishing Provider Cache Manager
Host: sharepointserver
Content-Length: 0
Connection: Keep-Alive
Pragma: no-cache
Cookie: WSS_KeepSessionAuthenticated={164a5252-50a4-4112-9e22-ae36dfsdddc}

 

The next request gets even weirder:

Continue reading

Client Hyper-V Issues in Windows 8 Consumer Preview–Release Preview Update

Update: After having all kinds of stability and performance issues with VirtualBox on the Win8 Release Preview, I decided to try Client Hyper-V again. Guess What? They fixed the networking issues!! Your clients can now use bridged networking without MAC address conflicts. There is still no graphic acceleration, having it installed breaks other virtualiztion software (although a quick bcdedit can fix that temporarily). I have modified things below accordingly.

 

If you were considering ditching VirtualBox for Client Hyper-V in Windows 8, well don’t do it just yet (unless you are having problems with VirtualBox). Enabling the Hyper-V hypervisor has performance and stability issues with host applications, there are all kinds of issues with the networking, it just doesn’t perform all that well, and there is absolutely zero graphics acceleration. And if you enable it, other Virtualization software will not work right.

If you remote desktop into your VM you can at least get sound, but on my relatively well endowed machine, the remote desktop was sluggish – more sluggish than RDP over the internet.

Despite Hyper-V being a low-level hypervisor, VirtualBox is blowing it out of the water in all respects. The networking is rock solid, performance is screaming(something changed from CP to RP, or some updated in VBox, cause the performance hasn’t been so hot), and you get sound and 2d / 3d acceleration(This doesn’t work at all on my machine) (Oh, and you don’t have to boot to an IDE drive)

Hopefully some of this gets worked out in future releases

Windows 7, UEFI vs BIOS, GPT vs MBR notes

I’ve been digging and experimenting quite a bit with the boot processes in Windows 7 (64-bit only), trying to accomplish completely unsupported things.

Here’s a couple things I have learned that you might find helpful

  • 32-bit Windows cannot boot uefi, nor can it be booted from a (microsoft) efi bootloader. This includes Thin PC.
  • The actual windows partition and installation doesn’t seem to care how it is booted – and it doesn’t matter how it was installed:
    • For UEFI, \Windows\system32\winload.efi is used to boot
    • For BIOS, \Windows\system32\winload.exe is used instead
    • Both sets of files exist regardless the type of system windows is installed on, and can be used interchangeably – I have taken a windows folder installed on a UEFI machine and booted in on a BIOS machine, and vice versa
  • Windows Image Backups are always MBR, even if the source drive was GPT. (at least for the partition containing windows)
  • Windows Image Backup’s recovery tool will not allow you to restore from a UEFI machine to a BIOS machine but…
  • Because the image is MBR, you can boot it on a BIOS machine with a little work

Using three monitors with Windows 8 CP

Windows 8 gets rid of the Start button for a “hot corner” approach – peg your mouse cursor to the lower-left corner of the screen and you get access to the “Start screen.” It actually works pretty well, and is easier / quicker to reliably “hit” than the old start button. Likewise, the lower corner provides access to search and other options.

 

 

But say you have more than one monitor?

If you have two – it’s not really so bad. As long as the main display is the one to the left, and your second display is to the right – the right-side corners are easier to hit (Especially since there is actually a button on the right side)

image

 

But suppose you have three? And suppose you want your primary display to be in the middle. Suddenly the hot corners become unusable. Presently there is no solution that makes all corners work well, but you can at least get your lower-left “start” corner working better with a little “nudge”:

image

In your monitor setup, bump the side monitors up just a tad (you can click and drag to reposition them). The consequence of this is that windows that span monitors will not line up, but at least you can actually hit the start corner…

 

I would argue that in many ways the Windows 8 Consumer Preview does a decent job of integrating the touch and mouse input methods… as long as you only use one screen. But seriously, who does that anymore?