$Id: INSTALL,v 1.14 2002/10/14 18:36:02 bu Exp $ Microbrew MicroSieve Version 0.7.0 Beta USENET SPAM filter http://www.microbrew.org/products/usieve/ by Bulent Yilmaz LICENSING ========= MicroSieve is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. MicroSieve is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with MicroSieve; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA A full copy of the GNU General Public License can be found at: http://www.gnu.org/licenses/gpl.txt PLATFORMS ========= Supported Platforms: - Solaris 7 - Linux 2.4.x (Last tested with 0.4.0) Supported News Servers: - Cyclone - Typhoon UNSUPPORTED SYSTEMS: - Any platform using INN: INN either needs a patch applied to allow for filters that are binaries, or a perl/python wrapper script must be written for MicroSieve. This will probably not be coming any time soon from us. Any patches to make it work on any platform or news server are welcome. Please report any successful compiles/uses on other platforms and news servers to the author so it can be included in this list. INSTALLATION ============ 1. Type './configure' optionally with the following: --prefix= Specifies the directory where the binary 'usieve' and configuration file 'usieve.conf' live. Default: /usr/local --disable-block-binaries Turns off checking for binaries in non-binary groups. This constitutes by far the largest amount of illegitimate spam filtered by this program. Default: On --enable-bot-checks Turns on checking for spam bot headers. This is not as effective as one might think since the spam bot patterns within have pretty much gone out of use. Default: Off --enable-user-regex Turns on user supplied regular expression filtering. This allows much more flexibility in filtering out unwanted spam. Default: Off --disable-auto-accept Turns off Auto-accept articles by checking the 'Path' header against a regular expression specified in the configuration file. Useful for distributing spam-filtering load across your entire news system. Default: On --disable-auto-blackhole Turns off auto-rejection of articles based on path header against a regular expression specified in the configuration file. Useful for dropping all articles originating from problem sites. Default: On --disable-max-crossposts Turns off checking for articles exceeding a number of crossposts specified in the configuration file. Default: On --disable-history-check Turns off history based rejection of duplicate messages. This is mostly useful for filtering copies of binary reposts. By default, the history is set to (2^20-1) messages, which translates to about 20MB extra memory use. You can set how big you want the history to be in the configuration file. This will use 20 bytes of memory per entry. While this section of code is fast, it will add considerable CPU and memory load to the system. Use this option only if you know you have the memory and the CPU cycles to spare. Default: On --enable-stats-logging Turns on stats output to the standard log file. This outputs interesting, but not really useful overall statistics regarding how many articles were rejected and so forth. Some may consider this a waste of CPU. Default: Off --enable-massive-debug Not widely useful debugging output except for development purposes. WARNING: This is a disk space hog! Default: Off --with-libpcre[=] Uses libpcre instead of the system regular expression library. This option can be specified with a location. This is particularly useful for Linux users who have a very slow system regular expression library. Default: Off 2. Type 'make' in the top level source directory. 3. Type 'make install' in the top level source directory. 4. Optionally edit configuration file specified at the end of the install. It is a GOOD idea to edit this file to your specifications, and you REALLY want to edit this file if you've left on auto-accept or auto-blackhole. Lines that start with a '#' are considered comments, all other '#' characters are considered part of the configuration. The following options are case sensitive: MaxArticleSize Sets the maximum size of the entire article including all the headers. Default: 1048576 ArticleHistorySize This sets the maximum number of articles to keep in the duplicate checking list. Generally, the bigger this is, the more duplicates you'll catch at the cost of speed. Note that each entry requires 20 bytes of memory so be sure you have enough free to use it. To maximize efficiency use a value that is (2^n - 1). Default: 1048575 StatsInterval This specifies the amount of time in seconds between stats logging if you have that option on. Default: 300 MaxCrossPosts This specifies the maximum number of newsgroups that one article can be posted to simultaneously. Default: 5 AutoAcceptRegex This specifies the regular expression to use for path header based auto accept. YOU REALLY WANT TO SET THIS OPTION if you've left this feature on. Default: !! BlackholeRegex This specifies the regular expression to use for path header based auto accept. YOU REALLY WANT TO SET THIS OPTION if you've left this feature on. Default: !FFFFFFFF! ****> See warning below about the following options. UserRegexFile This specifies the filename of the configuration file for the user supplied regular expressions. This is simply a file that contains a list of "" Default: /etc/regex.conf LogFile This specifies the filename of the log file for MicroSieve. Default: /var/usieve.log PIDFile This specifies the name of the file where the PID of the program will be stored. Default: /var/usieve.pid StatsFile This is the filename where MicroSieve will output its statistics if you have turned that option on. Default: /var/usieve.stats ArticleLog Filename for article logging for debugging purposes. Useful for developers. Default: /var/usieve.art HistoryLog This sets the filename which Microsieve will use when dumping and reading the contents of the article history log. Default: None ****> Make sure that the directory that the log files live in do exist and ****> have the correct permissions. Bad things will happen if they don't. INTEGRATION =========== For the program to be of any use it has to be integrated into the news software. (This will become automated eventually in the 0.7.x or 0.9.x series.) - Cyclone/Typhoon: 1. In the 'start.conf' file add the following line: PROGRAM="-program //usieve -body"; export PROGRAM 2. You should now be able to start cyclone/typhoon normally and it should work. To check if it is running use 'ps'.