Results 1 to 5 of 5

Thread: Slow Forum

  1. #1
    Join Date
    Nov 2007
    Location
    Bewdley, UK
    Posts
    2,700
    Thanks
    0
    Thanked 1 Time in 1 Post

    Default Slow Forum

    I apologise for the slow forum over the last few days it is due to excessive robots from external corporations downloading everything they can from the forum.

    Usually this is to bolster their own meagre efforts by data mining non commercial site such as this.

    The current culprit is from the United States

    ISP 198.143.187.114

    SingleHop of Chicago.

    Ross
    The Intellectual Property contained in this message has been assigned specifically to this web site.
    Copyright Ross McNeill 2015/2018 - All rights reserved.

  2. #2
    Join Date
    Aug 2011
    Posts
    1,732
    Thanks
    0
    Thanked 6 Times in 6 Posts

    Default

    No need to apologise Ross, it's not your fault.

    It's a shame that all forum administrators have to spend so much time combatting this kind of activity these days as it must take away any enjoyment gained from running the site.

    Anyway, thanks for all your efforts; very much appreciated

    Pete
    Main areas of research:

    - CA Butler and the loss of Lancaster ME334 (http://rafww2butler.wordpress.com/ )
    - Aircrew Training (Basic / Trade / Operational / Continuation / Conversion)
    - The History of No. 35 Squadron (1916 - 1982) (https://35squadron.wordpress.com/)

    [Always looking for copies of original documents / photographs etc relating to these subjects]

  3. #3
    Join Date
    Nov 2007
    Posts
    250
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Default Online Behaviour

    Hi Ross,

    I have noticed sometimes that when a web search is performed on a particular RAF-related keyword or person, results come up for third-party sites that contain information that clearly originates from this site, the difference being that the third party site hosting the mined information has per click adverts.

    I wish now that I had taken screen shots for evidence.

    In theory, with enough computing power, storage space, and robots, one could automatically mine information, have the ability to host it (with adverts), and then automatically bump up search engine rankings, in order to generate revenue.

    It is, in my opinion, almost as diabolical as various high profile book retailers selling e-books that are simply the text of a free-to-view and not-written-for-profit Wikipedia article...

    Ross, is there any way you can put an automated Copyright disclaimer across the site?

    Cheers

    Rod

  4. #4
    Join Date
    Nov 2007
    Location
    North Tynedale, Northumberland
    Posts
    420
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    Have you tried this Ross?

    http://superuser.com/questions/343524/which-sites-reject-crawler-requests

    I used it rather successfully when the ACIA database was accessible online.

    Jim
    Jim Corbett

  5. #5
    Join Date
    Nov 2007
    Location
    Bewdley, UK
    Posts
    2,700
    Thanks
    0
    Thanked 1 Time in 1 Post

    Default

    The legit spiders are no real problem and welcome as I recommend Google for the site search tool

    They abide by the robot.txt which requests some parts of the site out of index and limits the number of page requests a second.

    The problem is the corporations, mostly China and USA, that are grabbing content for resale.

    They look for traffic because it denotes user clicks and are out to get the honey pot regardless of limitations requested by robot.txt file.

    A bit like your garage with closed and locked door, it will keep your goods safe from passers by but not from someone determined to steal.

    Since they are re-packaging the data any copyright is stripped off or ignored.

    At the moment the board gets targeted on average twice a month for about four days at a time. Does not take much to realise what will happen if more corporations target it.

    I could increase the number of data sockets (virtual user ports) to maintain access levels for normal humans but this is at exponential cost and will soon revert to the problems seen at the moment as more data miners target the archive.

    Real answer is to remove the reason for the data theft eg take the archive off line and make the messages drop off the forum after a month or so like the original discserver version of the board.

    Ross


    Ross
    The Intellectual Property contained in this message has been assigned specifically to this web site.
    Copyright Ross McNeill 2015/2018 - All rights reserved.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •