Apache Nutch is a extensible and scalable open source web crawler software project.Nutch provides extensible interfaces such as Parse, Index and ScoringFilter's for custom implementations e.g. Apache Tika for parsing.
Currently we are using apache nutch as standalone Crawler that need manual configuration and scheduling of crawls. If we need programatically create new crawl jobs (using rest api). Is there any easy to use library available?
With over 3 million reviews, we can provide the specific details that help you make an informed software buying decision for your business. Finding the right product is important, let us help.