comparison dcc.8.in @ 0:c7f6b056b673

First import of vendor version
author Peter Gervai <grin@grin.hu>
date Tue, 10 Mar 2009 13:49:58 +0100
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:c7f6b056b673
1 .\" Copyright (c) 2008 by Rhyolite Software, LLC
2 .\"
3 .\" This agreement is not applicable to any entity which sells anti-spam
4 .\" solutions to others or provides an anti-spam solution as part of a
5 .\" security solution sold to other entities, or to a private network
6 .\" which employs the DCC or uses data provided by operation of the DCC
7 .\" but does not provide corresponding data to other users.
8 .\"
9 .\" Permission to use, copy, modify, and distribute this software without
10 .\" changes for any purpose with or without fee is hereby granted, provided
11 .\" that the above copyright notice and this permission notice appear in all
12 .\" copies and any distributed versions or copies are either unchanged
13 .\" or not called anything similar to "DCC" or "Distributed Checksum
14 .\" Clearinghouse".
15 .\"
16 .\" Parties not eligible to receive a license under this agreement can
17 .\" obtain a commercial license to use DCC by contacting Rhyolite Software
18 .\" at sales@rhyolite.com.
19 .\"
20 .\" A commercial license would be for Distributed Checksum and Reputation
21 .\" Clearinghouse software. That software includes additional features. This
22 .\" free license for Distributed ChecksumClearinghouse Software does not in any
23 .\" way grant permision to use Distributed Checksum and Reputation Clearinghouse
24 .\" software
25 .\"
26 .\" THE SOFTWARE IS PROVIDED "AS IS" AND RHYOLITE SOFTWARE, LLC DISCLAIMS ALL
27 .\" WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES
28 .\" OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL RHYOLITE SOFTWARE, LLC
29 .\" BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES
30 .\" OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
31 .\" WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
32 .\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
33 .\" SOFTWARE.
34 .\"
35 .\" Rhyolite Software DCC 1.3.103-1.112 $Revision$
36 .\"
37 .Dd February 26, 2009
38 .ds volume-ds-DCC Distributed Checksum Clearinghouse
39 .Dt DCC 8 DCC
40 .Os " "
41 .Sh NAME
42 .Nm DCC
43 .Nd Distributed Checksum Clearinghouse
44 .Sh DESCRIPTION
45 The Distributed Checksum Clearinghouse or
46 .Nm
47 is a cooperative, distributed
48 system intended to detect "bulk" mail or mail sent to many people.
49 It allows individuals receiving a single mail message to determine
50 that many
51 other people have received essentially identical copies of the message
52 and so reject or discard the message.
53 .Pp
54 Source for the server, client, and utilities
55 is available at Rhyolite Software, LLC, http://www.rhyolite.com/dcc/
56 It is free for organizations that do not sell spam or virus filtering
57 services.
58 .Ss How the DCC Is Used
59 The DCC can be viewed as a tool for end users to enforce their
60 right to "opt-in" to streams of bulk mail
61 by refusing bulk mail except from sources in a "whitelist."
62 Whitelists are the responsibility of DCC clients,
63 since only they know which bulk mail they solicited.
64 .Pp
65 False positives or mail marked as bulk by a DCC server that
66 is not bulk occur only when a recipient of a message reports it
67 to a DCC server as having been received many times
68 or when the "fuzzy" checksums of differing messages are the same.
69 The fuzzy checksums ignore aspects of messages in order to compute
70 identical checksums for substantially identical messages.
71 The fuzzy checksums are designed to ignore only
72 differences that do not affect meanings.
73 So in practice, you do not need to worry about DCC false positive indications
74 of "bulk," but not all bulk mail is unsolicited bulk mail or spam.
75 You must either use whitelists to distinguish solicited from unsolicited bulk
76 mail
77 or only use DCC indications of "bulk" as part of a scoring system such
78 as SpamAssassin.
79 Besides unsolicited bulk email or spam,
80 bulk messages include legitimate mail such as
81 order confirmations from merchants,
82 legitimate mailing lists,
83 and empty or test messages.
84 .Pp
85 A DCC server estimates the number copies of a
86 message by counting checksums reported by DCC clients.
87 Each client must decide which
88 bulk messages are unsolicited and what degree of "bulkiness" is objectionable.
89 Client DCC software marks, rejects, or discards mail that is bulk
90 according to local thresholds on target addresses from DCC servers
91 and unsolicited according to local whitelists.
92 .Pp
93 DCC servers are usually configured to receive reports from as many targets
94 as possible, including sources that cannot be trusted to not exaggerate the
95 number of copies of a message they see.
96 A user of a DCC client angry about receiving a message could report it with
97 1,000,000 separate DCC reports
98 or with a single report claiming 1,000,000 targets.
99 An unprincipled user could subscribe a "spam trap" to mailing lists
100 such as those of the IETF or CERT.
101 Such abuses of the system area not problems,
102 because much legitimate mail is "bulk."
103 You cannot reject bulk mail unless you have a whitelist of sources
104 of legitimate bulk mail.
105 .Pp
106 DCC can also be used by an Internet service provider to detect bulk
107 mail coming from its own customers.
108 In such circumstances, the DCC client might be configured to only log
109 bulk mail from unexpected (not whitelisted) customers.
110 .Ss What the DCC Is
111 A DCC server accumulates counts of cryptographic checksums of
112 messages but not the messages themselves.
113 It exchanges reports of frequently seen checksums with other servers.
114 DCC clients send reports of checksums related to incoming mail to
115 a nearby DCC server running
116 .Xr dccd 8 .
117 Each report from a client includes the number of recipients for the message.
118 A DCC server accumulates the reports and responds to clients the
119 the current total number of recipients for each checksum.
120 The client adds an SMTP header to incoming mail containing the total
121 counts.
122 It then discards or rejects mail that is not whitelisted and has
123 counts that exceed local thresholds.
124 .Pp
125 A special value of the number of addressees is "MANY" and means
126 it is certain that this message was bulk and might be unsolicited,
127 perhaps because it came from a locally blacklisted source or was
128 addressed to an invalid address or "spam trap."
129 The special value "MANY" is merely the largest value
130 that fits in the fixed sized field containing the count of addressees.
131 That "infinity" accumulated total can be reached with millions of
132 independent reports as well as with one or two.
133 .Pp
134 DCC servers
135 .Em flood
136 or send
137 reports of checksums of bulk mail to neighboring servers.
138 .Pp
139 To keep a server's database of checksums from growing without bound,
140 checksums are forgotten when they become old.
141 Checksums of bulk mail are kept longer.
142 See
143 .Xr dbclean 8 .
144 .Pp
145 DCC clients pick the nearest working DCC server using a small shared
146 or memory mapped file,
147 .Pa @prefix@/map .
148 It contains server names, port numbers, passwords, recent performance
149 measures, and so forth.
150 This file allows clients to use quick retransmission timeouts
151 and to waste little time on servers that have temporarily
152 stopped working or become unreachable.
153 The utility program
154 .Xr cdcc 8
155 is used to maintain this file as well as to check the health of servers.
156 .Ss X-DCC Headers
157 The DCC software includes several programs used by clients.
158 .Xr Dccm 8
159 uses the sendmail "milter" interface to query a DCC server,
160 add header lines to incoming mail,
161 and reject mail whose total checksum counts are high.
162 Dccm is intended to be run with SMTP servers using sendmail.
163 .Pp
164 .Xr Dccproc 8
165 adds header lines to mail presented by file name or
166 .Pa stdin ,
167 but relies on other programs
168 such as procmail to deal with mail with large counts.
169 .Xr Dccsight 8
170 is similar but deals with previously computed checksums.
171 .Pp
172 .Xr Dccifd 8
173 is similar to dccproc but is not run separately for each mail message
174 and so is far more efficient.
175 It receives mail messages via a socket somewhat like dccm,
176 but with a simpler protocol that can be used by Perl scripts
177 or other programs.
178 .Pp
179 DCC SMTP header lines are of one of the forms:
180 .Bd -literal -offset 2n
181 X-DCC-brand-Metrics: client server-ID; bulk cknm1=count cknm2=count ...
182 X-DCC-brand-Metrics: client; whitelist
183 .Ed
184 where
185 .Bl -hang -offset 3n -compact
186 .It Em whitelist
187 appears if the global or per-user
188 .Pa whiteclnt
189 file marks the message as good.
190 .It Em brand
191 is the "brand name" of the DCC server, such as "RHYOLITE".
192 .It Em client
193 is the name or IP address of the DCC client that added the
194 header line to the SMTP message.
195 .It Em server-ID
196 is the numeric ID of the DCC server that the DCC client contacted.
197 .It Em bulk
198 is present if one or more checksum counts exceeded the DCC client's
199 thresholds to make the message "bulky."
200 .It Em bulk rep
201 is present if the DCC reputation of the IP address of the sender is bad.
202 .It Em cknm1 , Ns Em cknm2 , Ns ...
203 are types of checksums:
204 .Bl -hang -offset 2n -width "Message-IDx" -compact
205 .It Em IP
206 address of SMTP client
207 .It Em env_From
208 SMTP envelope value
209 .It Em From
210 SMTP header line
211 .It Em Message-ID
212 SMTP header line
213 .It Em Received
214 last Received: header line in the SMTP message
215 .It Em substitute
216 SMTP header line chosen by the DCC client, prefixed with the name of
217 the header
218 .It Em Body
219 SMTP body ignoring white-space
220 .It Em Fuz1
221 filtered or "fuzzy" body checksum
222 .It Em Fuz2
223 another filtered or "fuzzy" body checksum
224 .It Em rep
225 DCC reputation of the mail sender or the estimated
226 probability that the message is bulk.
227 .El
228 Counts for
229 .Em IP , env_From , From ,
230 .Em Message-Id , Received ,
231 and
232 .Em substitute
233 checksums are omitted by the DCC client if the server
234 says it has no information.
235 Counts for
236 .Em Fuz1
237 and
238 .Em Fuz2
239 are omitted if the message body is empty or
240 contains too little of the right kind of information
241 for the checksum to be computed.
242 .It Em count
243 is the total number of recipients of messages with that
244 checksum reported directly or indirectly to the DCC server.
245 The special count "MANY" means that DCC client have claimed that
246 the message is directed at millions of recipients.
247 "MANY" imples the message is definitely bulk, but not necessarily unsolicited.
248 The special counts "OK" and "OK2" mean the checksum has been
249 marked "good" or "half-good" by DCC servers.
250 .El
251 .Pp
252 .Ss Mailing lists
253 Legitimate mailing list traffic differs from spam only in being solicited
254 by recipients.
255 Each client should have a private whitelist.
256 .Pp
257 DCC whitelists can also mark mail as unsolicited bulk using
258 blacklist entries for commonly forged values such as "From: user@public.com".
259 .Ss White and Blacklists
260 DCC server and client whitelist files share a common format.
261 Server files are always named
262 .Pa whitelist
263 and one is required to be in the DCC home directory
264 with the other server files.
265 Client whitelist files are
266 named
267 .Pa whiteclnt
268 in the DCC home directory or a subdirectory specified with the
269 .Fl U
270 option for
271 .Xr dccm 8 .
272 They specify mail that should not be reported to a DCC server or that is
273 always unsolicited and almost certainly bulk.
274 .Pp
275 A DCC whitelist file contains blank lines, comments starting
276 with "#",
277 and lines of the following forms:
278 .Bl -tag -offset 2n -width 4n -compact
279 .It Ar include file
280 Copies the contents of
281 .Ar file
282 into the whitelist.
283 It can occur only in the main whitelist or whiteclnt file and not in an
284 included file.
285 The file name should be absolute or relative to the DCC home directory.
286 .Pp
287 .It Ar count Em value
288 lines specify checksums that should be white- or blacklisted.
289 .Bl -inset -offset 2n -compact
290 .It Ar count Em env_From Ar 821-path
291 .It Ar count Em env_To Ar dest-mailbox
292 .It Ar count Em From Ar 822-mailbox
293 .It Ar count Em Message-ID Ar <string>
294 .It Ar count Em Received Ar string
295 .It Ar count Em Substitute Ar header string
296 .It Ar count Ar Hex ctype cksum
297 .It Ar count Em ip Ar IP-address
298 .El
299 .Pp
300 .Bl -tag -offset 2n -width 4n -compact
301 .It Ar MANY Em value
302 indicates that millions of targets have received messages with
303 the header, IP address, or checksum
304 .Em value .
305 .It Ar OK Em value
306 .It Ar OK2 Em value
307 say that messages with
308 the header, IP address, or checksum
309 .Em value
310 are OK and should not reported to DCC servers
311 or be greylisted.
312 .Ar OK2
313 says that the message is "half OK."
314 Two
315 .Ar OK2
316 checksums associated with a message are equivalent to one
317 .Ar OK .
318 .br
319 A DCC server never shares or
320 .Em floods
321 reports containing checksums
322 marked in its whitelist with OK or OK2 to other servers.
323 A DCC client does not report or ask its server about messages
324 with a checksum marked OK or OK2 in the client whitelist.
325 This is intended to allow a DCC client to keep private mail
326 so private that even its checksums are not disclosed.
327 .It Ar MX Em IP-address-or-hostname
328 .It Ar MXDCC Em IP-address-or-hostname
329 mark an address or block of addresses of trust mail relays including
330 MX servers, smart hosts, and bastion or DMZ relays.
331 The DCC clients
332 .Xr dccm 8 ,
333 .Xr dccifd 8 ,
334 and
335 .Xr dccproc 8
336 parse and skip initial Received: headers added by listed MX servers to
337 determine the external sources of mail messages.
338 Unsolicited bulk mail that has been forwarded through listed addresses
339 is discarded by
340 .Xr dccm 8
341 and
342 .Xr dccifd 8
343 as if with
344 .Fl a Ar DISCARD
345 instead of rejected.
346 .Ar MXDCC
347 marks addresses that are MX servers that run DCC clients.
348 The checksums for a mail message that has been forwarded through
349 an address listed as MXDCC
350 queried instead of reported.
351 .It Ar SUBMIT Em IP-address-or-hostname
352 marks an IP address or block addresses of SMTP submission clients
353 such as web browsers
354 that cannot tolerate 4yz temporary rejections
355 but that cannot be trusted to not send spam.
356 Since they are local addresses, DCC Reputations are not computed for them.
357 .El
358 .Pp
359 .Ar value
360 in
361 .Ar count Em value
362 lines can be
363 .Bl -tag -offset 2n -width 4n -compact
364 .It Ar dest-mailbox
365 is an RFC\ 821 address or a local user name.
366 .It Ar 821-path
367 is an RFC\ 821 address.
368 .It Ar 822-mailbox
369 is an RFC\ 822 address with optional name.
370 .It Em Substitute Ar header
371 is the name of an SMTP header such as "Sender" or
372 the name of one of two SMTP envlope values, "HELO," or
373 "Mail_Host" for the resolved host name from the
374 .Ar 821-path
375 in
376 the message.
377 .It Ar Hex ctype cksum
378 starts with the string
379 .Em Hex
380 followed a checksum type, and
381 a string of four hexadecimal numbers obtained from a DCC log file
382 or the
383 .Xr dccproc 8
384 command using
385 .Fl CQ .
386 The checksum type is
387 .Em body , Fuz1 ,
388 or
389 .Em Fuz2
390 or one of the preceding checksum types such as
391 .Em env_From .
392 .It Ar IP-address
393 is a host name, IPv4 or IPv6 address, or a block
394 of IP addresses in the standard xxx/mm from with
395 mm limited for server whitelists to 16 for IPv4 or 112 for IPv6.
396 There can be at most 64 CIDR blocks in a client
397 .Pa whiteclnt
398 file.
399 A host name is converted to IP addresses with DNS,
400 .Pa /etc/hosts
401 or other mechanisms
402 and one checksum for each addresses added to the whitelist.
403 .El
404 .Pp
405 .It Ar option setting
406 can only be in a DCC client
407 .Pa whiteclnt
408 file used by
409 .Xr dccifd 8 ,
410 .Xr dccm 8
411 or
412 .Xr dccproc 8 .
413 Settings in per-user whiteclnt files override settings
414 in the global file.
415 .Ar Setting
416 can be any of the following:
417 .Bl -tag -offset 2n -width 2n -compact
418 .It Ar option log-all
419 to log all mail messages.
420 .It Ar option log-normal
421 to log only messages that meet the logging thresholds.
422 .It Ar option log-subdirectory-day
423 .It Ar option log-subdirectory-hour
424 .It Ar option log-subdirectory-minute
425 creates log files containing mail messages in subdirectories
426 of the form
427 .Ar JJJ ,
428 .Ar JJJ/HH ,
429 or
430 .Ar JJJ/HH/MM
431 where
432 .Ar JJJ
433 is the current julian day,
434 .Ar HH
435 is the current hour, and
436 .Ar MM
437 is the current minute.
438 See also the
439 .Fl l Ar logdir
440 option for
441 .Xr dccm 8 ,
442 .Xr dccifd 8 ,
443 and
444 .Xr dccproc 8 .
445 .It Ar option dcc-on
446 .It Ar option dcc-off
447 Control DCC filtering.
448 See the discussion of
449 .Fl W
450 for
451 .Xr dccm 8
452 and
453 .Xr dccifd 8 .
454 .It Ar option greylist-on
455 .It Ar option greylist-off
456 to control greylisting.
457 Greylisting for other recipients in the same SMTP transaction
458 can still cause greylist temporary rejections.
459 .Ar greylist-off
460 in the main whiteclnt file.
461 .It Ar option greylist-log-on
462 .It Ar option greylist-log-off
463 to control logging of greylisted mail messages.
464 .It Ar option DCC-rep-off
465 .It Ar option DCC-rep-on
466 to honor or ignore DCC Reputations computed by the DCC server.
467 .It Ar option DNSBL1-off
468 .It Ar option DNSBL1-on
469 .It Ar option DNSBL2-off
470 .It Ar option DNSBL2-on
471 .It Ar option DNSBL3-off
472 .It Ar option DNSBL3-on
473 honor or ignore results of DNS blacklist checks configured with
474 .Fl B
475 for
476 .Xr dccm 8 ,
477 .Xr dccifd 8 ,
478 and
479 .Xr dccproc 8 .
480 .It Ar option MTA-first
481 .It Ar option MTA-last
482 consider MTA determinations of spam or not-spam first so they can be overridden
483 by
484 .Pa whiteclnt
485 files, or last so that they can override
486 .Pa whiteclnt files.
487 .It Ar option forced-discard-ok
488 .It Ar option no-forced-discard
489 control whether
490 .Xr dccm 8
491 and
492 .Xr dccifd 8
493 are allowed to discard a message for one mailbox for which
494 it is spam when it is not spam and must be delivered to another mailbox.
495 This can happen if a mail message is addressed to two or more mailboxes with
496 differing whitelists.
497 Discarding can be undesirable because false positives are not communicated
498 to mail senders.
499 To avoid discarding,
500 .Xr dccm 8
501 and
502 .Xr dccifd 8
503 running in proxy mode temporarily reject SMTP envelope
504 .Em Rcpt To
505 values that involve differing
506 .Pa whiteclnt
507 files.
508 .It Ar option threshold type,rej-thold
509 has the same effects as
510 .Fl c Ar type,rej-thold
511 for
512 .Xr dccproc 8
513 or
514 .Fl t Ar type,rej-thold
515 for
516 .Xr dccm 8
517 and
518 .Xr dccifd 8 .
519 It is useful only in per-user whiteclnt files to override the global
520 DCC checksum thresholds.
521 .It Ar option spam-trap-accept
522 .It Ar option spam-trap-reject
523 say that mail should be reported to the DCC server as extremely
524 bulk or with target counts of
525 .Ar MANY .
526 Greylisting, DNS blacklist (DNSBL), and other checks are turned off.
527 .Ar Spam-trap-accept
528 tells the MTA to accept the message while
529 .Ar spam-trap-reject
530 tells the MTA to reject the message.
531 Use
532 .Ar Spam-trap-accept
533 for spam traps that should not be disclosed.
534 .Ar Spam-trap-reject
535 can be used on
536 .Em catch-all
537 mailboxes that might receive legitimate mail by typographical errors
538 and that senders should be told about.
539 .El
540 .Pp
541 In the absence of explicit settings,
542 the default in the main whiteclnt file is equivalent to
543 .Bl -hang -offset 4n -width 4n -compact
544 .It Ar option log-normal
545 .It Ar option dcc-on
546 .It Ar option greylist-on
547 .It Ar option greylist-log-on
548 .It Ar option DCC-rep-off
549 .It Ar option DNSBL1-off
550 .It Ar option DNSBL2-off
551 .It Ar option DNSBL3-off
552 .It Ar MTA-last
553 .It Ar option no-forced-discard
554 .El
555 The defaults for individual recipient
556 .Pa whiteclnt
557 files are the same except as change by explicit settings
558 in the main file.
559 .El
560 .Pp
561 Checksums of the IP address of the SMTP client sending a mail message
562 are practically unforgeable, because it is impractical for
563 an SMTP client to "spoof" its address or pretend to use some other IP address.
564 That would make the IP address of the sender useful for whitelisting,
565 except that the IP address of the SMTP client
566 is often not available to users of
567 .Xr dccproc 8 .
568 In addition, legitimate mail relays make whitelist entries for IP
569 addresses of little use.
570 For example,
571 the IP address from which a message arrived might be that of a
572 local relay instead of the home address of a whitelisted mailing list.
573 .Pp
574 Envelope and header
575 .Ar From
576 values can be forged,
577 so whitelist entries for their checksums are not entirely reliable.
578 .Pp
579 Checksums of
580 .Ar env_To
581 values are never sent to DCC servers.
582 They are valid in only
583 .Pa whiteclnt
584 files
585 and used only by
586 .Xr dccm 8 ,
587 .Xr dccifd 8 ,
588 and
589 .Xr dccproc 8
590 when the envelope
591 .Em Rcpt To
592 value is known.
593 .Ss Greylists
594 The DCC server,
595 .Xr dccd 8 ,
596 can be used to maintain a greylist database for some DCC clients
597 including
598 .Xr dccm 8
599 and
600 .Xr dccifd 8 .
601 Greylisting involves temporarily refusing mail from unfamiliar
602 SMTP clients and is unrelated to filtering with a
603 Distributed Checksum Clearinghouse.
604 .br
605 See http://projects.puremagic.com/greylisting/
606 .Ss Privacy
607 Because sending mail is a less private act than receiving it,
608 and because sending bulk mail is usually not private at all
609 and cannot be very private,
610 the DCC tries first to protect the privacy of mail recipients,
611 and second the privacy of senders of mail that is not bulk.
612 .Pp
613 DCC clients necessarily disclose some information about mail they have
614 received.
615 The DCC database contains checksums of mail bodies,
616 header lines, and source addresses.
617 While it contains significantly less information than is
618 available by "snooping" on Internet links,
619 it is important that the DCC database be treated as containing
620 sensitive information and to not put the most private information
621 in the DCC database.
622 Given the contents of a message, one might determine
623 whether that message has been received
624 by a system that subscribes to the DCC.
625 Guesses about the sender and addressee of a message can also be
626 validated if the checksums of the message have been sent to a DCC server.
627 .Pp
628 Because the DCC is distributed,
629 organizations can operate their own DCC servers, and configure
630 them to share or "flood" only the checksums of bulk mail that is not
631 in local whitelists.
632 .Pp
633 DCC clients should not report the checksums of messages known to be
634 private to a DCC server.
635 For example, checksums of messages local to
636 a system or that are otherwise known a priori to not be unsolicited bulk
637 should not be sent to a remote DCC server.
638 This can accomplished by adding entries for the sender to the
639 client's local whitelist file.
640 Client whitelist files can also include entries for email recipients
641 whose mail should not be reported to a DCC server.
642 .Ss Security
643 Whenever considering security,
644 one must first consider the risks.
645 The worst DCC security problems are
646 unauthorized commands to a DCC service,
647 denial of the DCC service,
648 and corruption of DCC data.
649 The worst that can be done with remote commands to a DCC server is
650 to turn it off or otherwise cause it to stop responding.
651 The DCC is designed to fail gracefully,
652 so that a denial of service attack
653 would at worst allow delivery of mail that would otherwise be rejected.
654 Corruption of DCC data might at worst cause mail that is already
655 somewhat "bulk" by virtue of being received by two or more people
656 to appear have higher recipient numbers.
657 Since DCC users
658 .Em must
659 whitelist all sources of legitimate bulk mail,
660 this is also not a concern.
661 Such security risks should be addressed,
662 but only with defenses that don't cost more than the possible damage from
663 an attack.
664 .Pp
665 The DCC must contend with senders of unsolicited bulk mail who
666 resort to unlawful actions
667 to express their displeasure at having their advertising blocked.
668 Because the DCC protocol is based
669 on UDP, an unhappy advertiser could try to
670 flood a DCC server with
671 packets supposedly from subscribers or non-subscribers.
672 DCC servers defend against that attack by rate-limiting requests
673 from anonymous users.
674 .Pp
675 Also because of the use of UDP, clients must be protected
676 against forged answers to their queries.
677 Otherwise an unsolicited bulk mail advertiser could send
678 a stream of "not spam" answers to an SMTP
679 client while simultaneously sending mail that would otherwise be
680 rejected.
681 This is not a problem for authenticated clients of the
682 DCC because they share a secret with the DCC.
683 Unauthenticated, anonymous DCC
684 clients do not share any secrets with the DCC, except for unique and
685 unpredictable bits in each query or report sent to the DCC.
686 Therefore, DCC servers cryptographically sign answers to
687 unauthenticated clients with bits from the corresponding queries.
688 This protects against attackers that do not
689 have access to the stream of packets from the DCC client.
690 .Pp
691 The passwords or shared secrets used in the DCC client and server programs
692 are "cleartext" for several reasons.
693 In any shared secret authentication system,
694 at least one party must know the secret or keep the secret in cleartext.
695 You could encrypt the secrets in a file, but because they are used
696 by programs, you would need a cleartext copy of the key to decrypt
697 the file somewhere in the system, making such a scheme more expensive
698 but no more secure than a file of cleartext passwords.
699 Asymmetric systems such as that used in UNIX allow one party to not
700 know the secrets, but they must be and are
701 designed to be computationally expensive when used in applications
702 like the DCC that involve thousands or more authentication checks per second.
703 Moreover, because of "dictionary attacks,"
704 asymmetric systems are now little more secure than
705 keeping passwords in cleartext.
706 An adversary can compare the hash values of combinations of common words
707 with /etc/passwd hash values to look for bad passwords.
708 Worse, by the nature of a client/server protocol like that used in
709 the DCC, clients must have the cleartext password.
710 Since it is among the more numerous and much less secure clients
711 that adversaries would seek files of DCC passwords,
712 it would be a waste to complicate the DCC server with an asymmetric
713 system.
714 .Pp
715 The DCC protocol is vulnerable to dictionary attacks to recover passwords.
716 An adversary could capture some DCC packets, and then check to see
717 if any of the 100,000 to 1,000,000 passwords in so called
718 "cracker dictionaries"
719 applied to a packet generated the same signature.
720 This is a concern only if DCC passwords are poorly chosen, such
721 as any combination of words in an English dictionary.
722 There are ways to prevent this vulnerability regardless of
723 how badly passwords are chosen, but they are computationally expensive
724 and require additional network round trips.
725 Since DCC passwords are created and typed into files once
726 and do not need to be remembered by people,
727 it is cheaper and quite easy to simply choose good passwords
728 that are not in dictionaries.
729 .Ss Reliability
730 It is better to fail to filter unsolicited bulk mail than to fail
731 to deliver legitimate mail, so DCC clients fail in the direction of
732 assuming that mail is legitimate or even whitelisted.
733 .Pp
734 A DCC client sends a report or other request and waits for an answer.
735 If no answer arrives within a reasonable time,
736 the client retransmits.
737 There are many things that
738 might result in the client not receiving an answer,
739 but the most important is packet loss.
740 If the client's request does not reach the server,
741 it is easy and harmless for the client to retransmit.
742 If the client's request reached the server but the server's response was lost,
743 a retransmission to the same server would be misunderstood as
744 a new report of another copy of the same message unless it is detected
745 as a retransmission by the server.
746 The DCC protocol includes transactions identifiers for this purpose.
747 If the client retransmitted to a second server,
748 the retransmission would be misunderstood by the second server as
749 a new report of the same message.
750 .Pp
751 Each request from a client includes a timestamp to aid the client in
752 measuring the round trip time to the server and to let the client pick
753 the closest server.
754 Clients monitor the speed of all of the servers they know including
755 those they are not currently using,
756 and use the quickest.
757 .Ss Client and Server-IDs
758 Servers and clients use numbers or IDs to identify themselves.
759 ID 1 is reserved for anonymous, unauthenticated clients.
760 All other IDs are associated with a pair of passwords in the
761 .Pa ids
762 file, the
763 current and next or previous and current passwords.
764 Clients included their client IDs in their messages.
765 When they are not using the anonymous ID,
766 they sign their messages to servers with the first password
767 associated with their client-ID.
768 Servers treat messages with signatures that match neither of the passwords
769 for the client-ID in their own
770 .Pa ids
771 file as if the client had used the anonymous ID.
772 .Pp
773 Each server has a unique
774 .Em server-ID
775 less than 32768.
776 Servers use their IDs to identify checksums that they
777 .Em flood
778 to other servers.
779 Each server expects local clients sending administrative
780 commands to use the server's ID and sign administrative commands
781 with the associated password.
782 .Pp
783 Server-IDs must be unique among all systems that share reports
784 by "flooding."
785 All servers must be told of the IDs all other servers whose
786 reports can be received in the local
787 .Pa @prefix@/flod
788 file described in
789 .Xr dccd 8 .
790 However, server-IDs can be mapped during flooding between
791 independent DCC organizations.
792 .Pp
793 .Em Passwd-IDs
794 are server-IDs that should not be assigned to servers.
795 They appear in the often publicly readable
796 .Pa @prefix@/flod
797 and specify passwords in the private
798 .Pa @prefix@/ids
799 file for the inter-server flooding protocol
800 .Pp
801 The client identified by a
802 .Em client-ID
803 might be a single computer with a
804 single IP address, a single but multi-homed computer, or many computers.
805 Client-IDs are not used to identify checksum reports, but
806 the organization operating the client.
807 A client-ID need only be unique among clients using a single server.
808 A single client can use different client-IDs for different servers,
809 each client-ID authenticated with a separate password.
810 .Pp
811 An obscure but important part of all of this is that the
812 inter-server flooding algorithm
813 depends on server-IDs and timestamps attached to reports of checksums.
814 The inter-server flooding mechanism
815 requires cooperating DCC servers to maintain reasonable clocks
816 ticking in UTC.
817 Clients include timestamps in their requests, but as long as their
818 timestamps are unlikely to be repeated, they need not be very accurate.
819 .Ss Installation Considerations
820 DCC clients on a computer share information about which servers
821 are currently working and their speeds in a shared memory segment.
822 This segment also contains server host names, IP addresses, and
823 the passwords needed to authenticate known clients to servers.
824 That generally requires that
825 .Xr dccm 8 ,
826 .Xr dccproc 8 ,
827 .Xr dccifd 8 ,
828 and
829 .Xr cdcc 8
830 execute with an UID that
831 can write to the DCC home directory and its files.
832 The sendmail interface, dccm,
833 is a daemon that can be started by an "rc" or other script already
834 running with the correct UID.
835 The other two, dccproc and cdcc need to be set-UID because they are
836 used by end users.
837 They relinquish set-UID privileges when not needed.
838 .Pp
839 Files that contain cleartext passwords including the shared file used by clients
840 must be readable only by "owner."
841 .Pp
842 The data files required by a DCC can be in a single "home" directory,
843 .Pa @prefix@ .
844 Distinct DCC servers can run on a single computer, provided they use
845 distinct UDP port numbers and home directories.
846 It is possible and convenient for the DCC clients using a server
847 on the same computer to use the same home directory as the server.
848 .Pp
849 The DCC source distribution includes sample control files.
850 They should be modified appropriately and then copied to the DCC
851 home directory.
852 Files that contain cleartext passwords must not be publicly readable.
853 .Pp
854 The DCC source includes "feature" m4 files to configure
855 sendmail to use
856 .Xr dccm 8
857 to check a DCC server about incoming mail.
858 .Pp
859 See also the INSTALL.html file.
860 .Ss Client Installation
861 Installing a DCC client starts with obtaining or compiling program binaries
862 for the client server data control tool,
863 .Xr cdcc 8 .
864 Installing the sendmail DCC interface,
865 .Xr dccm 8 ,
866 or
867 .Xr dccproc 8 ,
868 the general or
869 .Xr procmail 1
870 interface
871 is the main part of the client installation.
872 Connecting the DCC to sendmail with dccm is most powerful,
873 but requires administrative control of the system running sendmail.
874 .Pp
875 As noted above, cdcc and dccproc should be
876 set-UID to a suitable UID.
877 Root or 0 is thought to be safe for both, because they are
878 careful to release privileges except when they need them to
879 read or write files in the DCC home directory.
880 A DCC home directory,
881 .Pa @prefix@
882 should be created.
883 It must be owned and writable by the UID to which cdcc is set.
884 .Pp
885 After the DCC client programs have been obtained,
886 contact the operator(s) of the chosen DCC server(s)
887 to obtain
888 each server's
889 hostname,
890 port number,
891 and a
892 .Em client-ID
893 and corresponding password.
894 No client-IDs or passwords are needed touse
895 DCC servers that allow anonymous clients.
896 Use the
897 .Em load
898 or
899 .Em add
900 commands
901 of cdcc to create a
902 .Pa map
903 file in the DCC home directory.
904 It is usually necessary to create a client whitelist file of
905 the format described above.
906 To accommodate users sharing a computer but not ideas about what
907 is solicited bulk mail,
908 the client whitelist file can be any valid path name
909 and need not be in the DCC home directory.
910 .Pp
911 If dccm is chosen,
912 arrange to start it with suitable arguments
913 before sendmail is started.
914 See the
915 .Pa homedir/dcc_conf
916 file and the
917 .Pa misc/rcDCC
918 script in the DCC source.
919 The procmail DCCM interface,
920 .Xr dccproc 8 ,
921 can be run manually or by a
922 .Xr procmailrc 5
923 rule.
924 .Ss Server Installation
925 The DCC server,
926 .Xr dccd 8 ,
927 also requires that the DCC home directory exist.
928 It does not use the client shared or memory mapped file of server
929 addresses,
930 but it requires other files.
931 One is the
932 .Pa @prefix@/ids
933 file of client-IDs, server-IDs, and corresponding passwords.
934 Another is a
935 .Pa flod
936 file of peers that send and receive floods of reports of checksums
937 with large counts.
938 Both files are described
939 in
940 .Xr dccd 8 .
941 .Pp
942 The server daemon should be started when the system is rebooted,
943 probably before sendmail.
944 See the
945 .Pa misc/rcDCC
946 and
947 .Pa misc/start-dccd
948 files in the DCC source.
949 .Pp
950 The database should be cleaned regularly with
951 .Xr dbclean 8
952 such as by running the crontab job that is in the misc directory.
953 .Sh SEE ALSO
954 .Xr cdcc 8 ,
955 .Xr dbclean 8 ,
956 .Xr dcc 8 ,
957 .Xr dccd 8 ,
958 .Xr dccifd 8 ,
959 .Xr dccm 8 ,
960 .Xr dccproc 8 ,
961 .Xr dblist 8 ,
962 .Xr dccsight 8 ,
963 .Xr sendmail 8 .
964 .Sh HISTORY
965 Distributed Checksum Clearinghouses are based on an idea of Paul Vixie
966 with code designed and written at Rhyolite Software starting in 2000.
967 This document describes version 1.3.103.