Results 1 to 8 of 8

Thread: AIR78 files and Optical Character Reading (OCR)

  1. #1
    Join Date
    Nov 2007
    Location
    Egmond-Binnen, The Netherlands
    Posts
    290
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default AIR78 files and Optical Character Reading (OCR)

    Dear all,

    Perhaps the wrong forum for a rather technical question, but hopefully some of you have dealt with the same challenge:
    the AIR78 documents provide a wealth of info regarding first names and service numbers of RAF airmen. The pdf documents are easily searchable when it concerns uncommon names, however with too common names such as Smith, Thompson, Johnson, etc it is a time consuming task to check hundreds of pages for one name or service number.

    However, Optical Character Reading should provide a technical aid to perform a search simply on service number. With my limited knowledge of software I've tried to OCR the pdf files, but to no avail unfortunately. Perhaps the real problem is that the files were initially on microfilm, converted to jpg images and finally put in pdf documents.

    Anyway, does anybody have made the same attempts to make the AIR78 files searchable with OCR means and/or could someone point me in the right direction how to do that?

    Thanks in advance for advice,
    Hans Nauta

  2. #2
    Join Date
    Nov 2007
    Location
    Bewdley, UK
    Posts
    2,700
    Thanks
    0
    Thanked 1 Time in 1 Post

    Default

    Hi Hans,

    When the files were first released I used Acrobat Pro to process the images.

    More failures than success and what was produced still needed proofing to get reliable results.

    The major cause of failure to OCR was the clarity of the original images. They range from good contrast to barely legible even on the same page.

    Also the mixture of typescript and handscript cause scan failures.

    Eventually gave up as beyond my OCR pay and skill level.

    What I am doing now is to add bookmarks to the pages, initially first and last name/service number in the pdf file then aiming to progress to the first name/service number in each page.

    This way a search on name and/or service number will give a hit for a page of 8 cards.

    What I have noticed is that for identical batches of first/second and surname the service numbers are in ascending order so for two consecutive pages of eg Martin Smith 50000 and Martin Smith 579990 - Martin Smith 559999, although not indexed will lie in the first page, not the second.

    If the images were clear then I would suggest a group task for the 177 piece files but given the quality I thought that a TNA guide was needed to allow access to originals at either TNA or RAFAHB.

    Regards
    Ross
    The Intellectual Property contained in this message has been assigned specifically to this web site.
    Copyright Ross McNeill 2015/2018 - All rights reserved.

  3. #3
    Join Date
    Nov 2007
    Location
    Reading, Berkshire, UK
    Posts
    3,608
    Thanks
    3
    Thanked 12 Times in 12 Posts

    Default

    Ross' explanation is a minor bit of good news (in the perverse sense!). I had the same trouble as Hans, but I assumed it was just me not being able to cope with the TNA system. Seems I'm not alone. I tried AIR 78/1 as an experiment to see if I could find the right ABLITT (Thread on the Forum a few days ago). Took me hours and hours of laborious searching, and I still didn't find him (quite apart from the fact that I kept getting "SAMPLE" stamped across many apparently handwritten cards). Agree the quality of some images is very, very, poor (or non-existent). Why, then, were they put on-line, if - as Ross seems to be suggesting - you still have to go to TNA to view them properly. Seems to defeat the object! Or am I missing something? I had hoped that when the TNA site asked how I wanted to download the file(s) the answer "In .csv format" would perform miracles. Alas, No!! Presumably, and eventually, they will all be on a spreadsheet (or similar)? They'd better hurry. Some of us are getting a bit long in the tooth!!
    Rgds
    Peter Davies
    Meteorology is a science; good meteorology is an art!
    We might not know - but we might know who does!

  4. #4
    Join Date
    Nov 2007
    Location
    Egmond-Binnen, The Netherlands
    Posts
    290
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    Hi Ross and Peter,

    Thanks for your reply, apparently I'm not the only one struggling with this matter!

    Regards,
    Hans

  5. #5
    Join Date
    Nov 2007
    Location
    Wiltshire
    Posts
    2,502
    Thanks
    0
    Thanked 3 Times in 3 Posts

    Default

    Apologies for returning to the AIR78 files, but what exactly do they contain? I'm simply after the address of the next-of-kin for one man who died in 1941, but I'm unable to download due to the file size, and the view on-line doesn't appear to work (I assume the "View this record" means one can view on-line).

    No matter, if the information I'm after is not included I'll stop wasting my time.

    Advice would be much appreciated.

    Brian

  6. #6
    Join Date
    Nov 2007
    Location
    Bewdley, UK
    Posts
    2,700
    Thanks
    0
    Thanked 1 Time in 1 Post

    Default

    Most of the problems for people (and 100% of the met men fault reports!) areassociated with the viewing online option.

    Most reliable way is to download the entire AIR 78 piece. This comes as a sizeable pdf (250 MB) but once on your computer the acrobat allows scroll and page display without problems due to hardware.

    The original cards are rolladex type single sided. They contain only Surname, first names or initials, service number (any previous number, nee name in the case of a married WAAF.

    A number contain random letter codes but most do not.

    No other info is given they are only the index to service records that have yet to be released..

    Before microfilming the cards have beem placed 6 or 8 on the platter below the camera and a microfilm image taken.


    Regards
    Ross
    The Intellectual Property contained in this message has been assigned specifically to this web site.
    Copyright Ross McNeill 2015/2018 - All rights reserved.

  7. #7
    Join Date
    Nov 2007
    Location
    Wiltshire
    Posts
    2,502
    Thanks
    0
    Thanked 3 Times in 3 Posts

    Default

    Much obliged Ross. I did try to download, (even received the 'Thank you for your order' reply), but nothing happened so I assumed I'd done something wrong. No matter, I'm obviously barking up the wrong tree that's of no interest for my needs.

    I apologise on behalf of met men everywhere, but we are (very) simple souls - we are, I'm afraid, dinosaurs from an era when 'advanced technology' was the pencil and rubber used to analyse charts which had been hand-plotted using a double-headed pen!

    Brian

  8. #8
    Join Date
    Nov 2007
    Location
    Bewdley, UK
    Posts
    2,700
    Thanks
    0
    Thanked 1 Time in 1 Post

    Default

    Now you have let the cat out of the bag.

    The crew room was convinced it was incantation, chicken entrails and casting bones - not high tech pencil and eraser!

    At least I can console myself that the final stage in a forecast is still a skilled task - doing mulitiple passes of the chart through the fax while holding one end to stretch it out of recognition.

    Regards
    Ross
    The Intellectual Property contained in this message has been assigned specifically to this web site.
    Copyright Ross McNeill 2015/2018 - All rights reserved.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •