[benzedrine.cx logo]
Daniel Hartmeier
OpenBSD Packet Filter
Mailing list
Annoying spammers
Prioritizing ACKs
Transparent squid
Mikero's grid puzzle
[OpenBSD Journal]


Fravia's lore
(last updated 2001-04-04)

Reject mail matching regular expressions


sendmail's milter API allows programs to register themselves and get called during mail transactions. Such plugins will see all mails passing through sendmail, including SMTP envelope parameters and mail headers and body. They can cause sendmail to reject messages with permanent or temporary error replies or discard messages silently, based on arbitrary conditions.

milter-regex is a very simple plugin that rejects or discards messages matching regular expressions. It doesn't add much processing overhead, so even a busy mail server can afford to run it.

Inline filtering

Filtering mails 'inline', i.e. while the SMTP transaction is happening, has several advantages compared to post-processing as commonly done using procmail. Messages rejected inline do not have to be stored locally just to get deleted again later. The sender immediately gets an SMTP error code and the receiver doesn't generate any bounce messages (which might get sent to fake sender addresses, and cost bandwidth and queue space).

Furthermore, inline filtering applies to all messages passing through the system. A single filter can reject incoming and outgoing messages to and from all users.

Regular expressions

Spam filters like SpamAssassin can use complex algorithms to detect offending messages, at the cost of consuming considerable resources. Regular expression matching is much simpler and allows to reject large volumes of unwanted messages (like current email worms) at low cost, greatly reducing the load on more complex filters called subsequently. Regular expressions are a commonly known and versatile tool, and well-suited for quickly matching the most urgent threats.


The milter API is relatively new, but already several plugins have been written that filter messages in various ways, some of them using regular expressions in some form. milter-regex does not provide any fundamentally different features. Its main goal is to support both basic and extended regular expressions in a useable way and stay lean enough to be affordable on busy mail servers. It doesn't change or add headers, and relinquishes resources back to sendmail as early as possible (not reading message bodies when there are no expressions to match the body against). milter-regex runs on OpenBSD and is BSD licensed.

If you find this program useful, remember that open source programmers have to pay for food, too. Donations welcome ;)

Man page

MILTER-REGEX(8)		OpenBSD System Manager's Manual	       MILTER-REGEX(8)
     milter-regex - sendmail milter plugin for regular expression filtering
     milter-regex [-d] [-c config] [-p pipe] [-u user]
     The milter-regex plugin can be used with the milter API of sendmail(8) to
     filter mails using regular expressions matching SMTP envelope parameters
     and mail headers and body.
     The options are as follows:
     -d		Don't detach from controlling terminal and produce verbose de-
		bug output on stdout.
     -c config	Use the specified configuration file instead of the default,
     -p pipe	Use the specified pipe to interface sendmail(8).  Default is
     -u user	Run as the specified user instead of the default, _milter-
		regex.	When milter-regex is started as root, it calls
		setuid(2) to drop privileges.  The non-privileged user should
		have read access to the configuration file and read-write ac-
		cess to the pipe.
     The plugin needs to be registered in the sendmail(8) configuration, by
     adding the following lines to the .mc file
		   `S=unix:/var/spool/milter-regex/sock, T=S:30s;R:2m')
     rebuilding /etc/mail/sendmail.cf from the .mc file using m4(1), and
     restarting sendmail(8).
     The configuration file consists of rules that, when matched, cause
     sendmail(8) to reject mails.  Emtpy lines and lines starting with # are
     ignored, as well as leading whitespace (blanks, tabs).  Trailing back-
     slashes can be used to wrap long rules into multiple lines.  Each rule
     starts with one of the following commands:
     reject <message>
	   Subsequent rules cause the mail to be rejected with a permanent er-
	   ror consisting of the specified text part.  The SMTP reply consists
	   of the three-digit code 554 (RFC 2821 "command rejected for policy
	   reasons"), the extended reply code 5.7.1 (RFC 1893 "Permanent Fail-
	   ure", "Security or Policy Status", "Delivery not authorized, mes-
	   sage refused") and the text part (which defaults to "Command re-
	   jected", if not specified).	This is a permanent failure, which
	   causes the sender to remove the message from its queue without try-
	   ing to retransmit, commonly generating a bounce message to the
     tempfail <message>
	   Subsequent matching rules cause the mail to be rejected with a tem-
	   porary error consisting of the specified text part.	The SMTP reply
	   consists of the three-digit code 451 (RFC 2821 "Requested action
	   aborted: local error in processing"), the extended reply code 4.7.1
	   (RFC 1893 "Persistent Transient Failure", "Security or Policy Sta-
	   tus", "Delivery not authorized, message refused") and the text part
	   (which defaults to "Please try again later", if not specified).
	   This is a temporary failure, which causes the sender to keep the
	   message in its queue and try to retransmit it, commonly for several
	   Subsequent matching rules cause the mail to be accepted but then
	   discarded silently.	Note that connect and helo rules should not
	   use discard.
     quarantine <message>
	   Subsequent matching rules cause the mail to be quarantined in
	   Subsequent matching rules cause the mail to be accepted without
	   further rule evaluation.  Can be used for whitelist criteria.
     A command is followed by one or more expressions, each causing the previ-
     ous command to be executed when matched.  The following expressions can
     be used:
     connect <hostname> <address>
	   Reject the connection if both the sender's hostname and address
	   match the specified regular expressions.  The numerical address is
	   either dotted-quad (IPv4) or coloned-hex (IPv6).  The hostname is
	   the result of a DNS reverse resolution of the numerical address
	   (which sendmail(8) performs independantly of the milter plugin).
	   When resolution fails, the hostname contains the numerical address
	   in square brackets.
     helo <name>
	   Reject the connection if the sender supplied HELO name matches the
	   specified regular expression.  Commonly, the sender supplies his
	   fully-qualified hostname as HELO name.
     envfrom <address>
	   Reject the mail if the sender supplied envelope MAIL FROM address
	   matches the specified regular expression.  Addresses commonly have
	   the form <[email protected]>.
     envrcpt <address>
	   Reject the mail if the sender supplied envelope RCPT TO address
	   matches the specified regular expression.
     header <name> <value>
	   Reject the mail if a header matches the specified name and value.
	   For instance, the header "Subject: Test" matches name Subject and
	   value Test.
     body <line>
	   Reject the mail if a body line matches the specified regular ex-
     macro <name> <value>
	   Reject the mail if a sendmail macro value matches.
     The plugin regularly checks the configuration file for modification and
     reloads it automatically.	Signals like SIGHUP will terminate the plugin,
     according to the milter signal handler.  The plugin reacts to any kind of
     error, like syntax errors in the configuration file, by failing open, ac-
     cepting all messages.  When the plugin is not running, sendmail(8) will
     accept all messages.
     The regular expressions used in the configuration rules are enclosed in
     arbitrary delimiters, no further escaping is needed.
     The first character of an argument is taken as the delimiter, and all
     subsequent characters up to the next occurance of the same delimiter are
     taken literally as the regular expression.	 Since the delimiter itself
     cannot be part of the regular expression (no escaping is supported), a
     delimiter must be chosen that doesn't occur in the regular expression it-
     self.  Each argument can use a different delimiter, all characters except
     spaces and tabs are valid.
     Two immediately adjacent delimiters form an empty regular expression,
     which always matches and requires no regexec(3) call.  This can be used
     in rules requiring multiple arguments, to match only some arguments.
     See re_format(7) for a detailed description of basic and extended regular
     Optionally, the following flags can be used after the closing delimiter:
     e	  Extended regular expression.	This sets REG_EXTENDED for regcomp(3).
     i	  Ignore upper/lower case.  This sets REG_ICASE.
     n	  Not matching.	 Reverses the matching result, i.e. the mail is re-
	  jected if the regular expression does not match.
     A rule can consist of either a simple term or more complex expressions.
     A term has the form
     header /From/ /domain/i
     and expressions can be built combining terms with operators "and", "or",
     "not" and parentheses, as in
     header /From/ /domain/i and body /money/
     ( not header /From/ /domain/ ) and ( body /sex/ or body /fast/ )
     Operator precedence should not be relied on, instead parentheses should
     be used to resolve any ambiguities (they usually produce syntax errors
     from the parser).
     Macros allow to store terms or expressions as a name, and $name can be
     used as term within other rules, expressions or macro definitions.	 Exam-
     friends	     = header /^Received$/ /^from [^ ]*(ork.net|home.com)/e
     attachments     = header ,^Content-Type$, ,multipart/mixed, and \
			 body ,^Content-Type: application/,
     executables     = $attachments and body ,name=".*.(pif|exe|scr)"$,e
     reject "executable attachment from non-friends"
     $executables and not $friends
     Macro names must begin with a letter and may contain alphanumeric charac-
     ters and punctuation characters.  Reserved keywords (like "reject" or
     "header") cannot be used as macro names.  Macros must be defined before
     use, the definition must precede the use in the configuration file, read
     from top to bottom.
     Rules are evaluated in the order specified in the configuration file,
     from top to bottom.  When a rule matches, the corresponding action is
     taken, that is the last action specified before the matching rule.
     The plugin evaluates the rules every time a line of mail (or envelope) is
     received.	As soon as a rule matches, the action is taken immediately,
     possibly before the entire mail is received, even if further lines might
     possibly make other rules match, too.  This means the first rule matching
     chronologically has precendence.
     If evaluation for a line of mail makes two (or more) rules match, the
     rule that comes first in the configuration file has precendence.
     Boolean expressions are short-circuit evaluated, that means "a or b" be-
     comes true as soon as one of the terms is true and "a and b" becomes
     false as soon as one of the terms is false, even if the other term is not
     known, possibly because the relevant mail line has not been received yet.
     # /etc/milter-regex.conf example
     # Accept anything encrypted, just to demonstrate sendmail macros
     macro /tls_version/ /TLSv/
     tempfail "Sender IP address not resolving"
     connect /\[.*\]/ //
     reject "Malformed HELO (not a domain, no dot)"
     helo /\./n
     reject "Malformed RCPT TO (not an email address, not <.*@.*>)"
     envrcpt /<(.*@.*|Postmaster)>/ein
     reject "HTML mail not accepted"
     # use comma as delimiter here, as / occurs within RE
     header /^Content-type$/i ,^text/html,i
     body ,^Content-type: text/html,i
     # Swen worm
     header /^(TO|FROM|SUBJECT)$/e //
     header /^Content-type$/i /boundary="Boundary_(ID_/i
     header /^Content-type$/i /boundary="[a-z]*"/
     body ,^Content-type: audio/x-wav; name="[a-z]*\.[a-z]*",i
     # Some nasty spammer
     reject "Business Corp spam, get lost"
     body /^Business Corp. for W.& L. AG/i and \
	     ( body /043.*317.*0285/ or body /0041.43.317.02.85/ )
     milter-regex sends log messages to syslogd(8) using facility daemon and,
     with increasing verbosity, level err, notice, info and debug.  The fol-
     lowing syslog.conf(5) section can be used to log messages to a dedicated
     daemon.err;daemon.notice	     /var/log/milter-regex
     Syntax for milter-regex in BNF:
     file	     = ( rule | macro ) file
     rule	     = action expr-list
     action	     = "reject" msg | "tempfail" msg | "discard" |		       "quarantine" msg | "accept"
     msg	     = ( '"' | "'" ) string ( '"' | "'" )
     expr-list	     = expr [ expr-list ]
     expr	     = term | term "and" expr | term "or" expr | "not" term
     term	     = '(' expr ')' | "connect" arg arg | "helo" arg |		       "envfrom" arg | "envrcpt" arg | "header" arg arg |		       "body" arg | "macro" arg arg | '$' name
     arg	     = del regex del flags
     del	     = '/' | ',' | '-' | ...
     flags	     = [ 'e' ] [ 'i' ] [ 'n' ]
     macro	     = name '=' expr
     mailstats(1), regex(3), syslog.conf(5), re_format(7), sendmail(8),
     Simple Mail Transfer Protocol, RFC 2821.
     Enhanced Mail System Status Codes, RFC 1893.
     The first version of milter-regex was written in 2003.  Boolean expres-
     sion evaluation was added in 2004.
     Daniel Hartmeier <[email protected]>
OpenBSD 4.0		      September 24, 2003			     5

More examples

If you have interesting rules that work for you, you're very welcome to contribute them.

HELO with your own IP address

From Christopher Kruslicky:
tempfail "Malformed HELO (can't be me)"
helo /^62\.65\.145\.30$/
Some spammers pick your own IP address as HELO, assuming it has a better chance of getting accepted by you than a random IP address (or some potentially non-resolving hostname).

Dynamic host addresses

From Darren Henderson:
# from your examples, tempfailing non-resolving rDNS connections                  
tempfail "Sender IP address not resolving"                                        
connect /\[.*\]/ //                                                               
# reject things that look like they might come from a dynamic address             
reject "Looks like a dynamic address"                                             
connect /[0-9][0-9]*\-[0-9][0-9]*\-[0-9][0-9]*/ //                                
connect /[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*/ //                                
connect /[0-9]{12}/e //                                                           
So, we reject anything that has three digit sets deperated by a dash, (ie adsl-134-11-333-11.someisp.net). We reject anything that has 3 or more numeric subdomains, (ie dialup. And finally reject any address that has a group of 12 digits, (ie pool123045067003.someisp.net).

Forged Outlook headers

Analyzing the spam that still gets delivered (and then promptly detected by SpamAssassin), I found that most of it uses fake Outlook headers. So let's add a rule to detect that inline (blatantly stealing rules from SpamAssassin ;).

HAS_MIMEOLE             = header /^X-MimeOLE$/ //
HAS_MSMAIL_PRI          = header /^X-MSMail-Priority$/ //
HAS_X_MAILER            = header /^X-Mailer$/ //
HAS_OUTLOOK_IN_MAILER   = header /^X-Mailer$/ /Microsoft (CDO|Outlook) /e
                            $HAS_X_MAILER and not $HAS_OUTLOOK_IN_MAILER
OUTLOOK_MUA             = header /^X-Mailer$/ / Outlook /
OUTLOOK_MSGID_1         = header /^Message-ID$/ \
OUTLOOK_MSGID_2         = header /^Message-ID$/ \
IMS_MSGID               = header /^Message-ID$/ \
UNUSABLE_MSGID          = header /^List-Unsubscribe$/ //
                            $OUTLOOK_MSGID_1 or $OUTLOOK_MSGID_2 )
MSGID_OE_SPAM_4ZERO     = header /^Message-ID$/ \
reject "Forged Outlook headers"
Some performance benchmarks would be interesting here, I'm quite sure these rules evaluate much cheaper inline in milter-regex than in SpamAssassin (Perl) after accepting delivery, or a milter plugin using spamc. If you measure how many mails per second max either of these can handle on a specific machine, please let me know.


Makefiles for GNU/Linux and Solaris are included, but might need some tweaking. If they don't work for you, please try to fix them and send me corrections. Some patches to build under Linux (not supported by me).


1.7: Aug 4, 2018

Support filtering sendmail macros, like {auth_type}.

1.6: June 6, 2005

Support sendmail quarantine action. Requires non-ancient sendmail (>= 8.13) and libmilter, as shipping with recent *BSD releases by default.
More fixes for the state machine, dealing with multi-message connections.

1.5: March 19, 2004

Fix logic errors in dealing with multi-message connections (SMTP RSET, HELO or MAIL FROM resetting SMTP state). Add cb_abort callback.

1.4: March 13, 2004

Some performance improvements, abort rule evaluation immediately when no further rules can possibly match. Compile without -Werror, as some ports generate warnings.

1.3: March 8, 2004

Two bugfixes related to RCPT TO: rule evaluation (DSN options and multiple receipients would match incorrectly), umask(0177) for pipe, fix for Solaris daemon() implementation. Improved logging (From:, To: and Subject: headers, when available).

1.2: February 27, 2004

Some logging improvements and small fixes. Adds Makefiles for GNU/Linux and Solaris. Thanks to everyone who helped me solve the build problems.

1.1: February 25, 2004

Support macro definition/expansion.

1.0: February 24, 2004

Now supports boolean expressions, so multiple regular expressions can be combined using and, or, not and parentheses.

Note that the new parser now requires quotes around reject/tempfail messages. If you get syntax errors in your existing configuration file, lacking quotes are a likely cause. Otherwise rulesets are backwards compatible with pre-1.0 versions.

0.1: September 24, 2003

First version.

Related links

Last updated on Tue Jul 1 09:32:34 2008.
Send comments to [email protected].

[Blue Ribbon Campaign] [powered by OpenBSD] [written in vi] [valid HTML 4.01]