Beagle currently only for local machine. Even on intranet, you can only use it for searching but can’t retrieve because the search results are presented as file:///…
Swish-e is good for non-utf8 currently. I been using for a while. not working for chinese (even in utf8).
here is an open source web search engine review in chinese.
Xapian Omega unfortunately does not support Chinese well.
Finally based on this, I found Hyperestraier to replace my grep-based chinese simi-web search. Originally it of course only support txt. For pdf or else, see this.
PATH=$PATH:/usr/local/share/hyperestraier/filter ; export PATH estcmd gather -cl -fx ".pdf" "H@estfxpdftohtml" -fz -sd -cm casket .
remove “-fz” switch to also index other files.