Article 3775 of comp.infosystems.gopher: Xref: feenix.metronet.com comp.infosystems.gopher:3775 Path: feenix.metronet.com!news.utdallas.edu!hermes.chpc.utexas.edu!cs.utexas.edu!math.ohio-state.edu!howland.reston.ans.net!ux1.cso.uiuc.edu!not-for-mail From: grady@ux1.cso.uiuc.edu (Mike Grady) Newsgroups: comp.infosystems.gopher Subject: Perl script for building wais index Date: 13 Jul 1993 15:50:45 -0500 Organization: University of Illinois at Urbana Lines: 111 Message-ID: <21v77j$1qd@ux1.cso.uiuc.edu> NNTP-Posting-Host: ux1.cso.uiuc.edu Summary: Perl script to descend file hierarchy and buidl wais index Keywords: perl wais index I use the following perl script to descend a set of directories and build a wais index. It is an alternate to using the find command to feed waisindex the filenames you want to be indexed. It also will create the appropriate links file, if it doesn't already exist. I find it easier to modify the exclusion list in perl than with a "straight find". A mod I might add is to look up the directories "name" in the .cap file so the title put into the .linkindex file is not generic "Search of Directory", although this is easy enough to edit and change. --------------- #!/usr/local/bin/perl # Build a waisindex for Gopher; can optionally supply two arguments: # buildwais $dir $indexname # $dir -- directory for which to build index (relative # to where we currently are). # (defaults to . -- i.e. where we are now) # This is also where the index will reside, in a # directory named ".waisindex". # $indexname -- "Name" to give to index if you don't # want it built with default of "index" # (the one value you would use is "indexg" to # build a global index that uses DOCN field). # DOCN and indexg are a local U. of Ill. construct # that required modifying several Gopher programs and # allows for a "tag" to be added at end of title # retrieved from wais index to identify document it # came from. # Creates a .indexlink file for wais index if it doesn't already exist. require "find.pl"; # see list of excluded names at end umask ( 002 ); # permissions to create files, directory with $newdirperm = 0775; $GOPHERROOT = '/usr/spool/gopher/gd'; # where your Gopher data tree begins #$GOPHERROOT = '/usr/spool/gopher/test'; $PROG = '/usr/staff/grady/bin/waisindex'; # where your waisindex program is $dir = shift(@ARGV); if ($dir eq "") { $dir = '.'; $indexdir = '.waisindex'; $linkfile = '.indexlink'; } else { $indexdir = $dir . '/.waisindex'; $linkfile = $dir . '/.indexlink'; } $indexname = shift(@ARGV); if ($indexname eq "") { $index = $indexdir . '/index'; } else { $index = $indexdir . '/' . $indexname; } unless (-e $linkfile) { $curdir = `pwd`; chop ($curdir); die "Not in Gopher tree!\n" unless (index($curdir,$GOPHERROOT)==0); $pos = length($GOPHERROOT); if ($GOPHERROOT eq $curdir) {$linkpath = '7';} else {$linkpath = '7' . substr($curdir,$pos);} open (LINK, ">$linkfile") || die "Can't open $linkfile: $!\n"; print LINK "Name=Search of Directory\n"; print LINK "Numb=2\n"; print LINK "Type=7\n"; print LINK "Path=$linkpath/$index\n"; print LINK "Host=+\n"; print LINK "Port=+\n"; close (LINK); } unless (-e $indexdir) { mkdir ($indexdir, $newdirperm) || die "Can't create index directory: $!\n"; } #open (TOWAIS,"| cat"); # to just see file names that will be indexed # rather than actually indexing them, use this # instead of following. open (TOWAIS,"| $PROG -d $index -stdin"); # Traverse desired filesystems &find($dir); close(TOWAIS); exit; # this establishes the files to be indexed; find traverses down the # hierarchy, and we ignore files/directories which begin with a dot, # core,adm,bin,dev,etc,usr, or end in .bak or .lock. sub wanted { ( (($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($_)) && ! /^\..?.*$/ && ! /^core$/ && ! /^adm$/ && ! /^bin$/ && ! /^dev$/ && ! /^etc$/ && ! /^usr$/ && ! /\.bak$/ && ! /\.lock$/ && print(TOWAIS "$name\n") ) || ($prune = 1); } -- Michael Grady, Univ. of Illinois Computing & Communications Services Office Rm. 1503 DCL, 1304 W. Springfield Ave., Urbana, IL 61801 Internet: mike-grady@uiuc.edu phone: (217) 244-1253 fax: (217) 244-7089 Disclaimer: The opinions of CCSO may differ from mine.