Mercurial > notdcc
view dcc.0 @ 4:d329bb5c36d0
Changes making it compile the new upstream release
author | Peter Gervai <grin@grin.hu> |
---|---|
date | Tue, 10 Mar 2009 14:57:12 +0100 |
parents | c7f6b056b673 |
children |
line wrap: on
line source
DCC(8) Distributed Checksum Clearinghouse DCC(8) NNAAMMEE DDCCCC -- Distributed Checksum Clearinghouse DDEESSCCRRIIPPTTIIOONN The Distributed Checksum Clearinghouse or DDCCCC is a cooperative, distrib- uted system intended to detect "bulk" mail or mail sent to many people. It allows individuals receiving a single mail message to determine that many other people have received essentially identical copies of the mes- sage and so reject or discard the message. Source for the server, client, and utilities is available at Rhyolite Software, LLC, http://www.rhyolite.com/dcc/ It is free for organizations that do not sell spam or virus filtering services. HHooww tthhee DDCCCC IIss UUsseedd The DCC can be viewed as a tool for end users to enforce their right to "opt-in" to streams of bulk mail by refusing bulk mail except from sources in a "whitelist." Whitelists are the responsibility of DCC clients, since only they know which bulk mail they solicited. False positives or mail marked as bulk by a DCC server that is not bulk occur only when a recipient of a message reports it to a DCC server as having been received many times or when the "fuzzy" checksums of differ- ing messages are the same. The fuzzy checksums ignore aspects of mes- sages in order to compute identical checksums for substantially identical messages. The fuzzy checksums are designed to ignore only differences that do not affect meanings. So in practice, you do not need to worry about DCC false positive indications of "bulk," but not all bulk mail is unsolicited bulk mail or spam. You must either use whitelists to distin- guish solicited from unsolicited bulk mail or only use DCC indications of "bulk" as part of a scoring system such as SpamAssassin. Besides unso- licited bulk email or spam, bulk messages include legitimate mail such as order confirmations from merchants, legitimate mailing lists, and empty or test messages. A DCC server estimates the number copies of a message by counting check- sums reported by DCC clients. Each client must decide which bulk mes- sages are unsolicited and what degree of "bulkiness" is objectionable. Client DCC software marks, rejects, or discards mail that is bulk accord- ing to local thresholds on target addresses from DCC servers and unso- licited according to local whitelists. DCC servers are usually configured to receive reports from as many tar- gets as possible, including sources that cannot be trusted to not exag- gerate the number of copies of a message they see. A user of a DCC client angry about receiving a message could report it with 1,000,000 separate DCC reports or with a single report claiming 1,000,000 targets. An unprincipled user could subscribe a "spam trap" to mailing lists such as those of the IETF or CERT. Such abuses of the system area not prob- lems, because much legitimate mail is "bulk." You cannot reject bulk mail unless you have a whitelist of sources of legitimate bulk mail. DCC can also be used by an Internet service provider to detect bulk mail coming from its own customers. In such circumstances, the DCC client might be configured to only log bulk mail from unexpected (not whitelisted) customers. WWhhaatt tthhee DDCCCC IIss A DCC server accumulates counts of cryptographic checksums of messages but not the messages themselves. It exchanges reports of frequently seen checksums with other servers. DCC clients send reports of checksums related to incoming mail to a nearby DCC server running dccd(8). Each report from a client includes the number of recipients for the message. A DCC server accumulates the reports and responds to clients the the cur- rent total number of recipients for each checksum. The client adds an SMTP header to incoming mail containing the total counts. It then dis- cards or rejects mail that is not whitelisted and has counts that exceed local thresholds. A special value of the number of addressees is "MANY" and means it is certain that this message was bulk and might be unsolicited, perhaps because it came from a locally blacklisted source or was addressed to an invalid address or "spam trap." The special value "MANY" is merely the largest value that fits in the fixed sized field containing the count of addressees. That "infinity" accumulated total can be reached with mil- lions of independent reports as well as with one or two. DCC servers _f_l_o_o_d or send reports of checksums of bulk mail to neighbor- ing servers. To keep a server's database of checksums from growing without bound, checksums are forgotten when they become old. Checksums of bulk mail are kept longer. See dbclean(8). DCC clients pick the nearest working DCC server using a small shared or memory mapped file, _/_v_a_r_/_d_c_c_/_m_a_p. It contains server names, port num- bers, passwords, recent performance measures, and so forth. This file allows clients to use quick retransmission timeouts and to waste little time on servers that have temporarily stopped working or become unreach- able. The utility program cdcc(8) is used to maintain this file as well as to check the health of servers. XX--DDCCCC HHeeaaddeerrss The DCC software includes several programs used by clients. Dccm(8) uses the sendmail "milter" interface to query a DCC server, add header lines to incoming mail, and reject mail whose total checksum counts are high. Dccm is intended to be run with SMTP servers using sendmail. Dccproc(8) adds header lines to mail presented by file name or _s_t_d_i_n, but relies on other programs such as procmail to deal with mail with large counts. Dccsight(8) is similar but deals with previously computed check- sums. Dccifd(8) is similar to dccproc but is not run separately for each mail message and so is far more efficient. It receives mail messages via a socket somewhat like dccm, but with a simpler protocol that can be used by Perl scripts or other programs. DCC SMTP header lines are of one of the forms: X-DCC-brand-Metrics: client server-ID; bulk cknm1=count cknm2=count ... X-DCC-brand-Metrics: client; whitelist where _w_h_i_t_e_l_i_s_t appears if the global or per-user _w_h_i_t_e_c_l_n_t file marks the message as good. _b_r_a_n_d is the "brand name" of the DCC server, such as "RHYOLITE". _c_l_i_e_n_t is the name or IP address of the DCC client that added the header line to the SMTP message. _s_e_r_v_e_r_-_I_D is the numeric ID of the DCC server that the DCC client con- tacted. _b_u_l_k is present if one or more checksum counts exceeded the DCC client's thresholds to make the message "bulky." _b_u_l_k _r_e_p is present if the DCC reputation of the IP address of the sender is bad. _c_k_n_m_1,_c_k_n_m_2,... are types of checksums: _I_P address of SMTP client _e_n_v___F_r_o_m SMTP envelope value _F_r_o_m SMTP header line _M_e_s_s_a_g_e_-_I_D SMTP header line _R_e_c_e_i_v_e_d last Received: header line in the SMTP message _s_u_b_s_t_i_t_u_t_e SMTP header line chosen by the DCC client, pre- fixed with the name of the header _B_o_d_y SMTP body ignoring white-space _F_u_z_1 filtered or "fuzzy" body checksum _F_u_z_2 another filtered or "fuzzy" body checksum _r_e_p DCC reputation of the mail sender or the esti- mated probability that the message is bulk. Counts for _I_P, _e_n_v___F_r_o_m, _F_r_o_m, _M_e_s_s_a_g_e_-_I_d, _R_e_c_e_i_v_e_d, and _s_u_b_s_t_i_t_u_t_e checksums are omitted by the DCC client if the server says it has no information. Counts for _F_u_z_1 and _F_u_z_2 are omitted if the message body is empty or contains too lit- tle of the right kind of information for the checksum to be computed. _c_o_u_n_t is the total number of recipients of messages with that check- sum reported directly or indirectly to the DCC server. The special count "MANY" means that DCC client have claimed that the message is directed at millions of recipients. "MANY" imples the message is definitely bulk, but not necessarily unsolicited. The special counts "OK" and "OK2" mean the checksum has been marked "good" or "half-good" by DCC servers. MMaaiilliinngg lliissttss Legitimate mailing list traffic differs from spam only in being solicited by recipients. Each client should have a private whitelist. DCC whitelists can also mark mail as unsolicited bulk using blacklist entries for commonly forged values such as "From: user@public.com". WWhhiittee aanndd BBllaacckklliissttss DCC server and client whitelist files share a common format. Server files are always named _w_h_i_t_e_l_i_s_t and one is required to be in the DCC home directory with the other server files. Client whitelist files are named _w_h_i_t_e_c_l_n_t in the DCC home directory or a subdirectory specified with the --UU option for dccm(8). They specify mail that should not be reported to a DCC server or that is always unsolicited and almost cer- tainly bulk. A DCC whitelist file contains blank lines, comments starting with "#", and lines of the following forms: _i_n_c_l_u_d_e _f_i_l_e Copies the contents of _f_i_l_e into the whitelist. It can occur only in the main whitelist or whiteclnt file and not in an included file. The file name should be absolute or relative to the DCC home directory. _c_o_u_n_t _v_a_l_u_e lines specify checksums that should be white- or blacklisted. _c_o_u_n_t _e_n_v___F_r_o_m _8_2_1_-_p_a_t_h _c_o_u_n_t _e_n_v___T_o _d_e_s_t_-_m_a_i_l_b_o_x _c_o_u_n_t _F_r_o_m _8_2_2_-_m_a_i_l_b_o_x _c_o_u_n_t _M_e_s_s_a_g_e_-_I_D _<_s_t_r_i_n_g_> _c_o_u_n_t _R_e_c_e_i_v_e_d _s_t_r_i_n_g _c_o_u_n_t _S_u_b_s_t_i_t_u_t_e _h_e_a_d_e_r _s_t_r_i_n_g _c_o_u_n_t _H_e_x _c_t_y_p_e _c_k_s_u_m _c_o_u_n_t _i_p _I_P_-_a_d_d_r_e_s_s _M_A_N_Y _v_a_l_u_e indicates that millions of targets have received messages with the header, IP address, or checksum _v_a_l_u_e. _O_K _v_a_l_u_e _O_K_2 _v_a_l_u_e say that messages with the header, IP address, or check- sum _v_a_l_u_e are OK and should not reported to DCC servers or be greylisted. _O_K_2 says that the message is "half OK." Two _O_K_2 checksums associated with a message are equivalent to one _O_K. A DCC server never shares or _f_l_o_o_d_s reports containing checksums marked in its whitelist with OK or OK2 to other servers. A DCC client does not report or ask its server about messages with a checksum marked OK or OK2 in the client whitelist. This is intended to allow a DCC client to keep private mail so private that even its checksums are not disclosed. _M_X _I_P_-_a_d_d_r_e_s_s_-_o_r_-_h_o_s_t_n_a_m_e _M_X_D_C_C _I_P_-_a_d_d_r_e_s_s_-_o_r_-_h_o_s_t_n_a_m_e mark an address or block of addresses of trust mail relays including MX servers, smart hosts, and bastion or DMZ relays. The DCC clients dccm(8), dccifd(8), and dccproc(8) parse and skip initial Received: headers added by listed MX servers to determine the external sources of mail messages. Unsolicited bulk mail that has been for- warded through listed addresses is discarded by dccm(8) and dccifd(8) as if with --aa _D_I_S_C_A_R_D instead of rejected. _M_X_D_C_C marks addresses that are MX servers that run DCC clients. The checksums for a mail message that has been forwarded through an address listed as MXDCC queried instead of reported. _S_U_B_M_I_T _I_P_-_a_d_d_r_e_s_s_-_o_r_-_h_o_s_t_n_a_m_e marks an IP address or block addresses of SMTP submission clients such as web browsers that cannot tolerate 4yz temporary rejections but that cannot be trusted to not send spam. Since they are local addresses, DCC Reputa- tions are not computed for them. _v_a_l_u_e in _c_o_u_n_t _v_a_l_u_e lines can be _d_e_s_t_-_m_a_i_l_b_o_x is an RFC 821 address or a local user name. _8_2_1_-_p_a_t_h is an RFC 821 address. _8_2_2_-_m_a_i_l_b_o_x is an RFC 822 address with optional name. _S_u_b_s_t_i_t_u_t_e _h_e_a_d_e_r is the name of an SMTP header such as "Sender" or the name of one of two SMTP envlope values, "HELO," or "Mail_Host" for the resolved host name from the _8_2_1_-_p_a_t_h in the message. _H_e_x _c_t_y_p_e _c_k_s_u_m starts with the string _H_e_x followed a checksum type, and a string of four hexadecimal numbers obtained from a DCC log file or the dccproc(8) command using --CCQQ. The check- sum type is _b_o_d_y, _F_u_z_1, or _F_u_z_2 or one of the preceding checksum types such as _e_n_v___F_r_o_m. _I_P_-_a_d_d_r_e_s_s is a host name, IPv4 or IPv6 address, or a block of IP addresses in the standard xxx/mm from with mm limited for server whitelists to 16 for IPv4 or 112 for IPv6. There can be at most 64 CIDR blocks in a client _w_h_i_t_e_c_l_n_t file. A host name is converted to IP addresses with DNS, _/_e_t_c_/_h_o_s_t_s or other mechanisms and one checksum for each addresses added to the whitelist. _o_p_t_i_o_n _s_e_t_t_i_n_g can only be in a DCC client _w_h_i_t_e_c_l_n_t file used by dccifd(8), dccm(8) or dccproc(8). Settings in per-user whiteclnt files override settings in the global file. _S_e_t_t_i_n_g can be any of the following: _o_p_t_i_o_n _l_o_g_-_a_l_l to log all mail messages. _o_p_t_i_o_n _l_o_g_-_n_o_r_m_a_l to log only messages that meet the logging thresholds. _o_p_t_i_o_n _l_o_g_-_s_u_b_d_i_r_e_c_t_o_r_y_-_d_a_y _o_p_t_i_o_n _l_o_g_-_s_u_b_d_i_r_e_c_t_o_r_y_-_h_o_u_r _o_p_t_i_o_n _l_o_g_-_s_u_b_d_i_r_e_c_t_o_r_y_-_m_i_n_u_t_e creates log files containing mail messages in subdirecto- ries of the form _J_J_J, _J_J_J_/_H_H, or _J_J_J_/_H_H_/_M_M where _J_J_J is the current julian day, _H_H is the current hour, and _M_M is the current minute. See also the --ll _l_o_g_d_i_r option for dccm(8), dccifd(8), and dccproc(8). _o_p_t_i_o_n _d_c_c_-_o_n _o_p_t_i_o_n _d_c_c_-_o_f_f Control DCC filtering. See the discussion of --WW for dccm(8) and dccifd(8). _o_p_t_i_o_n _g_r_e_y_l_i_s_t_-_o_n _o_p_t_i_o_n _g_r_e_y_l_i_s_t_-_o_f_f to control greylisting. Greylisting for other recipients in the same SMTP transaction can still cause greylist tem- porary rejections. _g_r_e_y_l_i_s_t_-_o_f_f in the main whiteclnt file. _o_p_t_i_o_n _g_r_e_y_l_i_s_t_-_l_o_g_-_o_n _o_p_t_i_o_n _g_r_e_y_l_i_s_t_-_l_o_g_-_o_f_f to control logging of greylisted mail messages. _o_p_t_i_o_n _D_C_C_-_r_e_p_-_o_f_f _o_p_t_i_o_n _D_C_C_-_r_e_p_-_o_n to honor or ignore DCC Reputations computed by the DCC server. _o_p_t_i_o_n _D_N_S_B_L_1_-_o_f_f _o_p_t_i_o_n _D_N_S_B_L_1_-_o_n _o_p_t_i_o_n _D_N_S_B_L_2_-_o_f_f _o_p_t_i_o_n _D_N_S_B_L_2_-_o_n _o_p_t_i_o_n _D_N_S_B_L_3_-_o_f_f _o_p_t_i_o_n _D_N_S_B_L_3_-_o_n honor or ignore results of DNS blacklist checks configured with --BB for dccm(8), dccifd(8), and dccproc(8). _o_p_t_i_o_n _M_T_A_-_f_i_r_s_t _o_p_t_i_o_n _M_T_A_-_l_a_s_t consider MTA determinations of spam or not-spam first so they can be overridden by _w_h_i_t_e_c_l_n_t files, or last so that they can override _w_h_i_t_e_c_l_n_t _f_i_l_e_s_. _o_p_t_i_o_n _f_o_r_c_e_d_-_d_i_s_c_a_r_d_-_o_k _o_p_t_i_o_n _n_o_-_f_o_r_c_e_d_-_d_i_s_c_a_r_d control whether dccm(8) and dccifd(8) are allowed to dis- card a message for one mailbox for which it is spam when it is not spam and must be delivered to another mailbox. This can happen if a mail message is addressed to two or more mailboxes with differing whitelists. Discarding can be undesirable because false positives are not communicated to mail senders. To avoid discarding, dccm(8) and dccifd(8) running in proxy mode temporarily reject SMTP envelope _R_c_p_t _T_o values that involve differing _w_h_i_t_e_c_l_n_t files. _o_p_t_i_o_n _t_h_r_e_s_h_o_l_d _t_y_p_e_,_r_e_j_-_t_h_o_l_d has the same effects as --cc _t_y_p_e_,_r_e_j_-_t_h_o_l_d for dccproc(8) or --tt _t_y_p_e_,_r_e_j_-_t_h_o_l_d for dccm(8) and dccifd(8). It is useful only in per-user whiteclnt files to override the global DCC checksum thresholds. _o_p_t_i_o_n _s_p_a_m_-_t_r_a_p_-_a_c_c_e_p_t _o_p_t_i_o_n _s_p_a_m_-_t_r_a_p_-_r_e_j_e_c_t say that mail should be reported to the DCC server as extremely bulk or with target counts of _M_A_N_Y. Greylisting, DNS blacklist (DNSBL), and other checks are turned off. _S_p_a_m_-_t_r_a_p_-_a_c_c_e_p_t tells the MTA to accept the message while _s_p_a_m_-_t_r_a_p_-_r_e_j_e_c_t tells the MTA to reject the message. Use _S_p_a_m_-_t_r_a_p_-_a_c_c_e_p_t for spam traps that should not be dis- closed. _S_p_a_m_-_t_r_a_p_-_r_e_j_e_c_t can be used on _c_a_t_c_h_-_a_l_l mail- boxes that might receive legitimate mail by typographical errors and that senders should be told about. In the absence of explicit settings, the default in the main whiteclnt file is equivalent to _o_p_t_i_o_n _l_o_g_-_n_o_r_m_a_l _o_p_t_i_o_n _d_c_c_-_o_n _o_p_t_i_o_n _g_r_e_y_l_i_s_t_-_o_n _o_p_t_i_o_n _g_r_e_y_l_i_s_t_-_l_o_g_-_o_n _o_p_t_i_o_n _D_C_C_-_r_e_p_-_o_f_f _o_p_t_i_o_n _D_N_S_B_L_1_-_o_f_f _o_p_t_i_o_n _D_N_S_B_L_2_-_o_f_f _o_p_t_i_o_n _D_N_S_B_L_3_-_o_f_f _M_T_A_-_l_a_s_t _o_p_t_i_o_n _n_o_-_f_o_r_c_e_d_-_d_i_s_c_a_r_d The defaults for individual recipient _w_h_i_t_e_c_l_n_t files are the same except as change by explicit settings in the main file. Checksums of the IP address of the SMTP client sending a mail message are practically unforgeable, because it is impractical for an SMTP client to "spoof" its address or pretend to use some other IP address. That would make the IP address of the sender useful for whitelisting, except that the IP address of the SMTP client is often not available to users of dccproc(8). In addition, legitimate mail relays make whitelist entries for IP addresses of little use. For example, the IP address from which a message arrived might be that of a local relay instead of the home address of a whitelisted mailing list. Envelope and header _F_r_o_m values can be forged, so whitelist entries for their checksums are not entirely reliable. Checksums of _e_n_v___T_o values are never sent to DCC servers. They are valid in only _w_h_i_t_e_c_l_n_t files and used only by dccm(8), dccifd(8), and dccproc(8) when the envelope _R_c_p_t _T_o value is known. GGrreeyylliissttss The DCC server, dccd(8), can be used to maintain a greylist database for some DCC clients including dccm(8) and dccifd(8). Greylisting involves temporarily refusing mail from unfamiliar SMTP clients and is unrelated to filtering with a Distributed Checksum Clearinghouse. See http://projects.puremagic.com/greylisting/ PPrriivvaaccyy Because sending mail is a less private act than receiving it, and because sending bulk mail is usually not private at all and cannot be very pri- vate, the DCC tries first to protect the privacy of mail recipients, and second the privacy of senders of mail that is not bulk. DCC clients necessarily disclose some information about mail they have received. The DCC database contains checksums of mail bodies, header lines, and source addresses. While it contains significantly less infor- mation than is available by "snooping" on Internet links, it is important that the DCC database be treated as containing sensitive information and to not put the most private information in the DCC database. Given the contents of a message, one might determine whether that message has been received by a system that subscribes to the DCC. Guesses about the sender and addressee of a message can also be validated if the checksums of the message have been sent to a DCC server. Because the DCC is distributed, organizations can operate their own DCC servers, and configure them to share or "flood" only the checksums of bulk mail that is not in local whitelists. DCC clients should not report the checksums of messages known to be pri- vate to a DCC server. For example, checksums of messages local to a sys- tem or that are otherwise known a priori to not be unsolicited bulk should not be sent to a remote DCC server. This can accomplished by adding entries for the sender to the client's local whitelist file. Client whitelist files can also include entries for email recipients whose mail should not be reported to a DCC server. SSeeccuurriittyy Whenever considering security, one must first consider the risks. The worst DCC security problems are unauthorized commands to a DCC service, denial of the DCC service, and corruption of DCC data. The worst that can be done with remote commands to a DCC server is to turn it off or otherwise cause it to stop responding. The DCC is designed to fail gracefully, so that a denial of service attack would at worst allow delivery of mail that would otherwise be rejected. Corruption of DCC data might at worst cause mail that is already somewhat "bulk" by virtue of being received by two or more people to appear have higher recipient numbers. Since DCC users _m_u_s_t whitelist all sources of legitimate bulk mail, this is also not a concern. Such security risks should be addressed, but only with defenses that don't cost more than the possible damage from an attack. The DCC must contend with senders of unsolicited bulk mail who resort to unlawful actions to express their displeasure at having their advertising blocked. Because the DCC protocol is based on UDP, an unhappy advertiser could try to flood a DCC server with packets supposedly from subscribers or non-subscribers. DCC servers defend against that attack by rate-lim- iting requests from anonymous users. Also because of the use of UDP, clients must be protected against forged answers to their queries. Otherwise an unsolicited bulk mail advertiser could send a stream of "not spam" answers to an SMTP client while simul- taneously sending mail that would otherwise be rejected. This is not a problem for authenticated clients of the DCC because they share a secret with the DCC. Unauthenticated, anonymous DCC clients do not share any secrets with the DCC, except for unique and unpredictable bits in each query or report sent to the DCC. Therefore, DCC servers cryptographi- cally sign answers to unauthenticated clients with bits from the corre- sponding queries. This protects against attackers that do not have access to the stream of packets from the DCC client. The passwords or shared secrets used in the DCC client and server pro- grams are "cleartext" for several reasons. In any shared secret authen- tication system, at least one party must know the secret or keep the secret in cleartext. You could encrypt the secrets in a file, but because they are used by programs, you would need a cleartext copy of the key to decrypt the file somewhere in the system, making such a scheme more expensive but no more secure than a file of cleartext passwords. Asymmetric systems such as that used in UNIX allow one party to not know the secrets, but they must be and are designed to be computationally expensive when used in applications like the DCC that involve thousands or more authentication checks per second. Moreover, because of "dictio- nary attacks," asymmetric systems are now little more secure than keeping passwords in cleartext. An adversary can compare the hash values of com- binations of common words with /etc/passwd hash values to look for bad passwords. Worse, by the nature of a client/server protocol like that used in the DCC, clients must have the cleartext password. Since it is among the more numerous and much less secure clients that adversaries would seek files of DCC passwords, it would be a waste to complicate the DCC server with an asymmetric system. The DCC protocol is vulnerable to dictionary attacks to recover pass- words. An adversary could capture some DCC packets, and then check to see if any of the 100,000 to 1,000,000 passwords in so called "cracker dictionaries" applied to a packet generated the same signature. This is a concern only if DCC passwords are poorly chosen, such as any combina- tion of words in an English dictionary. There are ways to prevent this vulnerability regardless of how badly passwords are chosen, but they are computationally expensive and require additional network round trips. Since DCC passwords are created and typed into files once and do not need to be remembered by people, it is cheaper and quite easy to simply choose good passwords that are not in dictionaries. RReelliiaabbiilliittyy It is better to fail to filter unsolicited bulk mail than to fail to deliver legitimate mail, so DCC clients fail in the direction of assuming that mail is legitimate or even whitelisted. A DCC client sends a report or other request and waits for an answer. If no answer arrives within a reasonable time, the client retransmits. There are many things that might result in the client not receiving an answer, but the most important is packet loss. If the client's request does not reach the server, it is easy and harmless for the client to retransmit. If the client's request reached the server but the server's response was lost, a retransmission to the same server would be misunder- stood as a new report of another copy of the same message unless it is detected as a retransmission by the server. The DCC protocol includes transactions identifiers for this purpose. If the client retransmitted to a second server, the retransmission would be misunderstood by the sec- ond server as a new report of the same message. Each request from a client includes a timestamp to aid the client in mea- suring the round trip time to the server and to let the client pick the closest server. Clients monitor the speed of all of the servers they know including those they are not currently using, and use the quickest. CClliieenntt aanndd SSeerrvveerr--IIDDss Servers and clients use numbers or IDs to identify themselves. ID 1 is reserved for anonymous, unauthenticated clients. All other IDs are asso- ciated with a pair of passwords in the _i_d_s file, the current and next or previous and current passwords. Clients included their client IDs in their messages. When they are not using the anonymous ID, they sign their messages to servers with the first password associated with their client-ID. Servers treat messages with signatures that match neither of the passwords for the client-ID in their own _i_d_s file as if the client had used the anonymous ID. Each server has a unique _s_e_r_v_e_r_-_I_D less than 32768. Servers use their IDs to identify checksums that they _f_l_o_o_d to other servers. Each server expects local clients sending administrative commands to use the server's ID and sign administrative commands with the associated password. Server-IDs must be unique among all systems that share reports by "flood- ing." All servers must be told of the IDs all other servers whose reports can be received in the local _/_v_a_r_/_d_c_c_/_f_l_o_d file described in dccd(8). However, server-IDs can be mapped during flooding between inde- pendent DCC organizations. _P_a_s_s_w_d_-_I_D_s are server-IDs that should not be assigned to servers. They appear in the often publicly readable _/_v_a_r_/_d_c_c_/_f_l_o_d and specify passwords in the private _/_v_a_r_/_d_c_c_/_i_d_s file for the inter-server flooding protocol The client identified by a _c_l_i_e_n_t_-_I_D might be a single computer with a single IP address, a single but multi-homed computer, or many computers. Client-IDs are not used to identify checksum reports, but the organiza- tion operating the client. A client-ID need only be unique among clients using a single server. A single client can use different client-IDs for different servers, each client-ID authenticated with a separate password. An obscure but important part of all of this is that the inter-server flooding algorithm depends on server-IDs and timestamps attached to reports of checksums. The inter-server flooding mechanism requires coop- erating DCC servers to maintain reasonable clocks ticking in UTC. Clients include timestamps in their requests, but as long as their time- stamps are unlikely to be repeated, they need not be very accurate. IInnssttaallllaattiioonn CCoonnssiiddeerraattiioonnss DCC clients on a computer share information about which servers are cur- rently working and their speeds in a shared memory segment. This segment also contains server host names, IP addresses, and the passwords needed to authenticate known clients to servers. That generally requires that dccm(8), dccproc(8), dccifd(8), and cdcc(8) execute with an UID that can write to the DCC home directory and its files. The sendmail interface, dccm, is a daemon that can be started by an "rc" or other script already running with the correct UID. The other two, dccproc and cdcc need to be set-UID because they are used by end users. They relinquish set-UID privileges when not needed. Files that contain cleartext passwords including the shared file used by clients must be readable only by "owner." The data files required by a DCC can be in a single "home" directory, _/_v_a_r_/_d_c_c. Distinct DCC servers can run on a single computer, provided they use distinct UDP port numbers and home directories. It is possible and convenient for the DCC clients using a server on the same computer to use the same home directory as the server. The DCC source distribution includes sample control files. They should be modified appropriately and then copied to the DCC home directory. Files that contain cleartext passwords must not be publicly readable. The DCC source includes "feature" m4 files to configure sendmail to use dccm(8) to check a DCC server about incoming mail. See also the INSTALL.html file. CClliieenntt IInnssttaallllaattiioonn Installing a DCC client starts with obtaining or compiling program bina- ries for the client server data control tool, cdcc(8). Installing the sendmail DCC interface, dccm(8), or dccproc(8), the general or procmail(1) interface is the main part of the client installation. Con- necting the DCC to sendmail with dccm is most powerful, but requires administrative control of the system running sendmail. As noted above, cdcc and dccproc should be set-UID to a suitable UID. Root or 0 is thought to be safe for both, because they are careful to release privileges except when they need them to read or write files in the DCC home directory. A DCC home directory, _/_v_a_r_/_d_c_c should be cre- ated. It must be owned and writable by the UID to which cdcc is set. After the DCC client programs have been obtained, contact the operator(s) of the chosen DCC server(s) to obtain each server's hostname, port num- ber, and a _c_l_i_e_n_t_-_I_D and corresponding password. No client-IDs or pass- words are needed touse DCC servers that allow anonymous clients. Use the _l_o_a_d or _a_d_d commands of cdcc to create a _m_a_p file in the DCC home direc- tory. It is usually necessary to create a client whitelist file of the format described above. To accommodate users sharing a computer but not ideas about what is solicited bulk mail, the client whitelist file can be any valid path name and need not be in the DCC home directory. If dccm is chosen, arrange to start it with suitable arguments before sendmail is started. See the _h_o_m_e_d_i_r_/_d_c_c___c_o_n_f file and the _m_i_s_c_/_r_c_D_C_C script in the DCC source. The procmail DCCM interface, dccproc(8), can be run manually or by a procmailrc(5) rule. SSeerrvveerr IInnssttaallllaattiioonn The DCC server, dccd(8), also requires that the DCC home directory exist. It does not use the client shared or memory mapped file of server addresses, but it requires other files. One is the _/_v_a_r_/_d_c_c_/_i_d_s file of client-IDs, server-IDs, and corresponding passwords. Another is a _f_l_o_d file of peers that send and receive floods of reports of checksums with large counts. Both files are described in dccd(8). The server daemon should be started when the system is rebooted, probably before sendmail. See the _m_i_s_c_/_r_c_D_C_C and _m_i_s_c_/_s_t_a_r_t_-_d_c_c_d files in the DCC source. The database should be cleaned regularly with dbclean(8) such as by run- ning the crontab job that is in the misc directory. SSEEEE AALLSSOO cdcc(8), dbclean(8), dcc(8), dccd(8), dccifd(8), dccm(8), dccproc(8), dblist(8), dccsight(8), sendmail(8). HHIISSTTOORRYY Distributed Checksum Clearinghouses are based on an idea of Paul Vixie with code designed and written at Rhyolite Software starting in 2000. This document describes version 1.3.103. February 26, 2009