Dear TREX Experts
In the last couple of hours / days ;-( I have been trying to setup an index for crawling a html page which consist of a list of references to pages that I need to index (html file is attached)
I have tried to crawl the page directly by uploading the HTML page to a webserver and afterwards created a webaddress (CM > repository manager>web Address) that links directly to the html page and I have also tried to create a HTTP system which results in the same error.
I have created a Crawler where the follow links is selected, and the dept is set to 5. The rest of the settings are default.
The result is that I receive when I start a reindex is that only one page has been indexed and non of the links has been accessed. I have also tried to upload the page to the webserver where all the references points to which results in either the same as above or once I received xx (number of a href references) number of errors under bad links in the detailed crawler report.
I have tried to access the html page that contains all the html links directly from the Portal/TREX server which works fine.
How does the crawler work I thought that it crawled through the HTML page an looked for A href tag (<A href=site.com>test</a>)
An example for the html page:
<a href="http://hotbucket.d">hotbucket.dk</a>
<a href="http://www.jubii.dk">jubii</a><br>
Any ideas on why I cant crawl a simple page that consist of HTML links?
Kind regards
John Stubbe