Quantcast
Channel: SCN : Popular Discussions - SAP Enterprise Portal: Content Management and Collaboration
Viewing all articles
Browse latest Browse all 1826

TREX - Crawl a html page that consist of a-href links!

$
0
0

Dear TREX Experts

In the last couple of hours / days ;-( I have been trying to setup an index for crawling a html page which consist of a list of references to pages that I need to index (html file is attached)

 

I have tried to crawl the page directly by uploading the HTML page to a webserver and afterwards created a “webaddress” (CM > repository manager>web Address) that links directly to the html page and I have also tried to create a HTTP system which results in the same error.

 

I have created a Crawler where the “follow links” is selected, and the dept is set to 5. The rest of the settings are default.

 

The result is that I receive when I start a “reindex” is that only one page has been indexed and non of the links has been accessed. I have also tried to upload the page to the webserver where all the references points to which results in either the same as above or once I received xx (number of a href references) number of errors under “bad links” in the detailed crawler report.

 

I have tried to access the html page that contains all the html links directly from the Portal/TREX server which works fine.

 

How does the crawler work – I thought that it crawled through the HTML page an looked for “A href” tag (<A href=”site.com”>test</a>)

 

An example for the html page:

<a href="http://hotbucket.d">hotbucket.dk</a>

<a href="http://www.jubii.dk">jubii</a><br>

 

Any ideas on why I cant crawl a simple page that consist of HTML links?

 

 

Kind regards

 

John Stubbe


Viewing all articles
Browse latest Browse all 1826

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>