{"id":1150,"date":"2009-01-22T11:39:11","date_gmt":"2009-01-22T19:39:11","guid":{"rendered":"http:\/\/softbeam.net\/hobby\/?p=1150"},"modified":"2009-01-22T11:39:11","modified_gmt":"2009-01-22T19:39:11","slug":"website-searching-and-indexing","status":"publish","type":"post","link":"https:\/\/softbeam.net\/hobby\/?p=1150","title":{"rendered":"website searching and indexing"},"content":{"rendered":"<p>Just add google custom search onto my website. See <a href=\"http:\/\/webeyes.blogspot.com\/2007\/08\/put-google-search-button-on-blog-web.html\" target=\"_blank\" rel=\"noopener noreferrer\">webeyes<\/a> for more details.<\/p>\n<p>WordPress&#8217; search box itself seems pretty powerless. Say I have lots of pdf files attached to my blog, wordpress can&#8217;t search the contents.<\/p>\n<p>Find <a href=\"http:\/\/swish-e.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">swish-e<\/a>, which, after indexing wherever you appointed, is extremely fast to search whatever you want.<\/p>\n<p>Unfortunately, it seems not working for chinese. For html file, grep -r &#8220;searching words&#8221; searchingdir is ok. For pdf, still a problem.<\/p>\n<p>Googled a piece of php code for web-based grep search for txt or html file (not for pdf):<\/p>\n<blockquote><p>&lt;!DOCTYPE HTML PUBLIC &#8220;-\/\/W3C\/\/DTD HTML 3.2 Final\/\/EN&#8221;&gt;<br \/>\n&lt;HTML&gt;<br \/>\n&lt;HEAD&gt;<br \/>\n&lt;meta http-equiv=&#8221;Content-Type&#8221; content=&#8221;text\/html; charset=utf8&#8243;&gt;<br \/>\n&lt;TITLE&gt;Search&lt;\/TITLE&gt;<br \/>\n&lt;\/HEAD&gt;<br \/>\n&lt;BODY&gt;<\/p>\n<p>&lt;?php<br \/>\n\/\/ since we use POST method, get global value into &#8216;test&#8217; string<br \/>\n$test=$_POST[&#8216;sstr&#8217;];<\/p><\/blockquote>\n<blockquote><p>\/\/ added to scrutinize input to inhibit malicious stuff<\/p><\/blockquote>\n<blockquote><p>$j = mb_strlen($test);<br \/>\nfor ($k = 0; $k &lt; $j; $k++) {<br \/>\n$char = mb_substr($test, $k, 1);<br \/>\nif ($char == &#8220;;&#8221; || $char == &#8220;&amp;&#8221; || $char == &#8220;&gt;&#8221; || $char == &#8220;&lt;&#8221; || $char == &#8220;|&#8221;) {<br \/>\nprint &#8220;character: $char is not allowed, try again.&lt;br&gt;&#8221;; exit;}<br \/>\n}<\/p>\n<p>\/\/ check if submit button pressed and some search value entered<br \/>\nif (isset($_POST[&#8216;submit&#8217;]) &amp;&amp; ! empty($test) )<br \/>\n{<br \/>\n\/\/ set file and path to search<br \/>\n$file=&#8221;txt\/&#8221; . $_POST[d_file];<\/p>\n<p>\/\/ check the file is readable<br \/>\nif (! is_readable ($file)) {<br \/>\nprint &#8220;ERROR accessing file $file, please contact the admin.&lt;br&gt;&#8221;; exit;}<\/p>\n<p>\/\/ build the search command<br \/>\n$cmdstr=&#8221;grep -r $test $file&#8221;;<br \/>\necho &#8220;&lt;A href=\\&#8221;$_SERVER[PHP_SELF]\\&#8221;&gt;Search more &#8230;&lt;\/A&gt;\\t&#8221;;<br \/>\necho &#8220;&lt;p&gt;Search results for $test in $file &#8230;&lt;\/p&gt;&#8221;;<br \/>\nflush();<\/p>\n<p>\/\/ search the file, display the result<br \/>\n$fp = popen($cmdstr . &#8216; 2&gt;&amp;1&#8217;, &#8216;r&#8217;); \/\/ open proc pointer<br \/>\necho &#8220;&lt;pre&gt;&#8221;;<br \/>\nwhile ($buffer = fgets($fp, 4096))<br \/>\n{<br \/>\necho(&#8220;http:\/\/www.softbeam.net\/$buffer&#8221;);<br \/>\n}<br \/>\necho &#8220;&lt;\/pre&gt;&#8221;;<br \/>\necho &#8220;&lt;p&gt;END&lt;\/p&gt;\\n&#8221;;<br \/>\npclose($fp);<br \/>\nflush();<br \/>\nexit;<br \/>\n}<\/p>\n<p>?&gt;<br \/>\n&lt;br&gt;<br \/>\n&lt;h2 align=&#8221;center&#8221;&gt;\u5168\u6587\u67e5\u8be2&lt;\/h2&gt;<br \/>\n&lt;p&gt;&lt;h4 align=&#8221;center&#8221;&gt;<br \/>\n&lt;form action=&#8221;&lt;?php echo $_SERVER[&#8216;PHP_SELF&#8217;]; ?&gt;&#8221; method=&#8221;POST&#8221;&gt;<br \/>\nSelect the category to search:&lt;br&gt;<br \/>\n&lt;select name=&#8221;d_file&#8221;&gt;<br \/>\n&lt;option value=&#8221;literature\/chinese&#8221;&gt;&#8212;&#8212;all chinese literature&lt;\/option&gt;<br \/>\n&lt;option value=&#8221;literature\/western&#8221;&gt;&#8212;&#8212;all western literature&lt;\/option&gt;<br \/>\n&lt;option value=&#8221;literature&#8221;&gt;&#8212;&#8212;&#8211;all literature&lt;\/option&gt;<br \/>\n&lt;option value=&#8221;history&#8221;&gt;&#8212;&#8212;&#8211;all history&lt;\/option&gt;<br \/>\n&lt;\/select&gt;&lt;br&gt;&lt;br&gt;<br \/>\n&lt;input type=&#8221;text&#8221; name=&#8221;sstr&#8221; size=&#8221;30&#8243; maxlength=&#8221;30&#8243;&gt;&lt;br&gt;&lt;br&gt;<br \/>\n&lt;input type=&#8221;submit&#8221; name=&#8221;submit&#8221; value=&#8221;Zzzzz&#8221;&gt;<br \/>\n&lt;\/form&gt;<br \/>\n&lt;\/h4&gt;&lt;\/p&gt;<\/p>\n<p>&lt;\/BODY&gt;<br \/>\n&lt;\/HTML&gt;<\/p><\/blockquote>\n<p>Here is another approach:<\/p>\n<p><a href=\"http:\/\/www.geekzone.co.nz\/content.asp?contentid=3939\" target=\"_blank\" rel=\"noopener noreferrer\">Using Google Desktop Search as a network search server<\/a><\/p>\n<p>For windows server, it works by combination with <a href=\"http:\/\/www.brothersoft.com\/dnka-34302.html\">DNKA<\/a>. For linux server,although google desktop has linux version available &#8211; <a href=\"http:\/\/desktop.google.com\/linux\/index.html\" target=\"_blank\" rel=\"noopener noreferrer\">GDL<\/a>, DNKA seems not linux ready.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Just add google custom search onto my website. See webeyes for more details. WordPress&#8217; search box itself seems pretty powerless. Say I have lots of pdf files attached to my blog, wordpress can&#8217;t search the contents. Find swish-e, which, after indexing wherever you appointed, is extremely fast to search whatever you want. Unfortunately, it seems &hellip; <a href=\"https:\/\/softbeam.net\/hobby\/?p=1150\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;website searching and indexing&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-1150","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/softbeam.net\/hobby\/index.php?rest_route=\/wp\/v2\/posts\/1150","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/softbeam.net\/hobby\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/softbeam.net\/hobby\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/softbeam.net\/hobby\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/softbeam.net\/hobby\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1150"}],"version-history":[{"count":0,"href":"https:\/\/softbeam.net\/hobby\/index.php?rest_route=\/wp\/v2\/posts\/1150\/revisions"}],"wp:attachment":[{"href":"https:\/\/softbeam.net\/hobby\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1150"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/softbeam.net\/hobby\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1150"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/softbeam.net\/hobby\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1150"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}