comparison dcc.0 @ 0:c7f6b056b673

First import of vendor version
author Peter Gervai <grin@grin.hu>
date Tue, 10 Mar 2009 13:49:58 +0100
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:c7f6b056b673
1 DCC(8) Distributed Checksum Clearinghouse DCC(8)
2
3 NNAAMMEE
4 DDCCCC -- Distributed Checksum Clearinghouse
5
6 DDEESSCCRRIIPPTTIIOONN
7 The Distributed Checksum Clearinghouse or DDCCCC is a cooperative, distrib-
8 uted system intended to detect "bulk" mail or mail sent to many people.
9 It allows individuals receiving a single mail message to determine that
10 many other people have received essentially identical copies of the mes-
11 sage and so reject or discard the message.
12
13 Source for the server, client, and utilities is available at Rhyolite
14 Software, LLC, http://www.rhyolite.com/dcc/ It is free for organizations
15 that do not sell spam or virus filtering services.
16
17 HHooww tthhee DDCCCC IIss UUsseedd
18 The DCC can be viewed as a tool for end users to enforce their right to
19 "opt-in" to streams of bulk mail by refusing bulk mail except from
20 sources in a "whitelist." Whitelists are the responsibility of DCC
21 clients, since only they know which bulk mail they solicited.
22
23 False positives or mail marked as bulk by a DCC server that is not bulk
24 occur only when a recipient of a message reports it to a DCC server as
25 having been received many times or when the "fuzzy" checksums of differ-
26 ing messages are the same. The fuzzy checksums ignore aspects of mes-
27 sages in order to compute identical checksums for substantially identical
28 messages. The fuzzy checksums are designed to ignore only differences
29 that do not affect meanings. So in practice, you do not need to worry
30 about DCC false positive indications of "bulk," but not all bulk mail is
31 unsolicited bulk mail or spam. You must either use whitelists to distin-
32 guish solicited from unsolicited bulk mail or only use DCC indications of
33 "bulk" as part of a scoring system such as SpamAssassin. Besides unso-
34 licited bulk email or spam, bulk messages include legitimate mail such as
35 order confirmations from merchants, legitimate mailing lists, and empty
36 or test messages.
37
38 A DCC server estimates the number copies of a message by counting check-
39 sums reported by DCC clients. Each client must decide which bulk mes-
40 sages are unsolicited and what degree of "bulkiness" is objectionable.
41 Client DCC software marks, rejects, or discards mail that is bulk accord-
42 ing to local thresholds on target addresses from DCC servers and unso-
43 licited according to local whitelists.
44
45 DCC servers are usually configured to receive reports from as many tar-
46 gets as possible, including sources that cannot be trusted to not exag-
47 gerate the number of copies of a message they see. A user of a DCC
48 client angry about receiving a message could report it with 1,000,000
49 separate DCC reports or with a single report claiming 1,000,000 targets.
50 An unprincipled user could subscribe a "spam trap" to mailing lists such
51 as those of the IETF or CERT. Such abuses of the system area not prob-
52 lems, because much legitimate mail is "bulk." You cannot reject bulk
53 mail unless you have a whitelist of sources of legitimate bulk mail.
54
55 DCC can also be used by an Internet service provider to detect bulk mail
56 coming from its own customers. In such circumstances, the DCC client
57 might be configured to only log bulk mail from unexpected (not
58 whitelisted) customers.
59
60 WWhhaatt tthhee DDCCCC IIss
61 A DCC server accumulates counts of cryptographic checksums of messages
62 but not the messages themselves. It exchanges reports of frequently seen
63 checksums with other servers. DCC clients send reports of checksums
64 related to incoming mail to a nearby DCC server running dccd(8). Each
65 report from a client includes the number of recipients for the message.
66 A DCC server accumulates the reports and responds to clients the the cur-
67 rent total number of recipients for each checksum. The client adds an
68 SMTP header to incoming mail containing the total counts. It then dis-
69 cards or rejects mail that is not whitelisted and has counts that exceed
70 local thresholds.
71
72 A special value of the number of addressees is "MANY" and means it is
73 certain that this message was bulk and might be unsolicited, perhaps
74 because it came from a locally blacklisted source or was addressed to an
75 invalid address or "spam trap." The special value "MANY" is merely the
76 largest value that fits in the fixed sized field containing the count of
77 addressees. That "infinity" accumulated total can be reached with mil-
78 lions of independent reports as well as with one or two.
79
80 DCC servers _f_l_o_o_d or send reports of checksums of bulk mail to neighbor-
81 ing servers.
82
83 To keep a server's database of checksums from growing without bound,
84 checksums are forgotten when they become old. Checksums of bulk mail are
85 kept longer. See dbclean(8).
86
87 DCC clients pick the nearest working DCC server using a small shared or
88 memory mapped file, _/_v_a_r_/_d_c_c_/_m_a_p. It contains server names, port num-
89 bers, passwords, recent performance measures, and so forth. This file
90 allows clients to use quick retransmission timeouts and to waste little
91 time on servers that have temporarily stopped working or become unreach-
92 able. The utility program cdcc(8) is used to maintain this file as well
93 as to check the health of servers.
94
95 XX--DDCCCC HHeeaaddeerrss
96 The DCC software includes several programs used by clients. Dccm(8) uses
97 the sendmail "milter" interface to query a DCC server, add header lines
98 to incoming mail, and reject mail whose total checksum counts are high.
99 Dccm is intended to be run with SMTP servers using sendmail.
100
101 Dccproc(8) adds header lines to mail presented by file name or _s_t_d_i_n, but
102 relies on other programs such as procmail to deal with mail with large
103 counts. Dccsight(8) is similar but deals with previously computed check-
104 sums.
105
106 Dccifd(8) is similar to dccproc but is not run separately for each mail
107 message and so is far more efficient. It receives mail messages via a
108 socket somewhat like dccm, but with a simpler protocol that can be used
109 by Perl scripts or other programs.
110
111 DCC SMTP header lines are of one of the forms:
112
113 X-DCC-brand-Metrics: client server-ID; bulk cknm1=count cknm2=count ...
114 X-DCC-brand-Metrics: client; whitelist
115 where
116 _w_h_i_t_e_l_i_s_t appears if the global or per-user _w_h_i_t_e_c_l_n_t file marks the
117 message as good.
118 _b_r_a_n_d is the "brand name" of the DCC server, such as "RHYOLITE".
119 _c_l_i_e_n_t is the name or IP address of the DCC client that added the
120 header line to the SMTP message.
121 _s_e_r_v_e_r_-_I_D is the numeric ID of the DCC server that the DCC client con-
122 tacted.
123 _b_u_l_k is present if one or more checksum counts exceeded the DCC
124 client's thresholds to make the message "bulky."
125 _b_u_l_k _r_e_p is present if the DCC reputation of the IP address of the
126 sender is bad.
127 _c_k_n_m_1,_c_k_n_m_2,... are types of checksums:
128 _I_P address of SMTP client
129 _e_n_v___F_r_o_m SMTP envelope value
130 _F_r_o_m SMTP header line
131 _M_e_s_s_a_g_e_-_I_D SMTP header line
132 _R_e_c_e_i_v_e_d last Received: header line in the SMTP message
133 _s_u_b_s_t_i_t_u_t_e SMTP header line chosen by the DCC client, pre-
134 fixed with the name of the header
135 _B_o_d_y SMTP body ignoring white-space
136 _F_u_z_1 filtered or "fuzzy" body checksum
137 _F_u_z_2 another filtered or "fuzzy" body checksum
138 _r_e_p DCC reputation of the mail sender or the esti-
139 mated probability that the message is bulk.
140 Counts for _I_P, _e_n_v___F_r_o_m, _F_r_o_m, _M_e_s_s_a_g_e_-_I_d, _R_e_c_e_i_v_e_d, and
141 _s_u_b_s_t_i_t_u_t_e checksums are omitted by the DCC client if the
142 server says it has no information. Counts for _F_u_z_1 and _F_u_z_2
143 are omitted if the message body is empty or contains too lit-
144 tle of the right kind of information for the checksum to be
145 computed.
146 _c_o_u_n_t is the total number of recipients of messages with that check-
147 sum reported directly or indirectly to the DCC server. The
148 special count "MANY" means that DCC client have claimed that
149 the message is directed at millions of recipients. "MANY"
150 imples the message is definitely bulk, but not necessarily
151 unsolicited. The special counts "OK" and "OK2" mean the
152 checksum has been marked "good" or "half-good" by DCC servers.
153
154 MMaaiilliinngg lliissttss
155 Legitimate mailing list traffic differs from spam only in being solicited
156 by recipients. Each client should have a private whitelist.
157
158 DCC whitelists can also mark mail as unsolicited bulk using blacklist
159 entries for commonly forged values such as "From: user@public.com".
160
161 WWhhiittee aanndd BBllaacckklliissttss
162 DCC server and client whitelist files share a common format. Server
163 files are always named _w_h_i_t_e_l_i_s_t and one is required to be in the DCC
164 home directory with the other server files. Client whitelist files are
165 named _w_h_i_t_e_c_l_n_t in the DCC home directory or a subdirectory specified
166 with the --UU option for dccm(8). They specify mail that should not be
167 reported to a DCC server or that is always unsolicited and almost cer-
168 tainly bulk.
169
170 A DCC whitelist file contains blank lines, comments starting with "#",
171 and lines of the following forms:
172 _i_n_c_l_u_d_e _f_i_l_e
173 Copies the contents of _f_i_l_e into the whitelist. It can occur
174 only in the main whitelist or whiteclnt file and not in an
175 included file. The file name should be absolute or relative to
176 the DCC home directory.
177
178 _c_o_u_n_t _v_a_l_u_e
179 lines specify checksums that should be white- or blacklisted.
180 _c_o_u_n_t _e_n_v___F_r_o_m _8_2_1_-_p_a_t_h
181 _c_o_u_n_t _e_n_v___T_o _d_e_s_t_-_m_a_i_l_b_o_x
182 _c_o_u_n_t _F_r_o_m _8_2_2_-_m_a_i_l_b_o_x
183 _c_o_u_n_t _M_e_s_s_a_g_e_-_I_D _<_s_t_r_i_n_g_>
184 _c_o_u_n_t _R_e_c_e_i_v_e_d _s_t_r_i_n_g
185 _c_o_u_n_t _S_u_b_s_t_i_t_u_t_e _h_e_a_d_e_r _s_t_r_i_n_g
186 _c_o_u_n_t _H_e_x _c_t_y_p_e _c_k_s_u_m
187 _c_o_u_n_t _i_p _I_P_-_a_d_d_r_e_s_s
188
189 _M_A_N_Y _v_a_l_u_e
190 indicates that millions of targets have received messages
191 with the header, IP address, or checksum _v_a_l_u_e.
192 _O_K _v_a_l_u_e
193 _O_K_2 _v_a_l_u_e
194 say that messages with the header, IP address, or check-
195 sum _v_a_l_u_e are OK and should not reported to DCC servers
196 or be greylisted. _O_K_2 says that the message is "half
197 OK." Two _O_K_2 checksums associated with a message are
198 equivalent to one _O_K.
199 A DCC server never shares or _f_l_o_o_d_s reports containing
200 checksums marked in its whitelist with OK or OK2 to other
201 servers. A DCC client does not report or ask its server
202 about messages with a checksum marked OK or OK2 in the
203 client whitelist. This is intended to allow a DCC client
204 to keep private mail so private that even its checksums
205 are not disclosed.
206 _M_X _I_P_-_a_d_d_r_e_s_s_-_o_r_-_h_o_s_t_n_a_m_e
207 _M_X_D_C_C _I_P_-_a_d_d_r_e_s_s_-_o_r_-_h_o_s_t_n_a_m_e
208 mark an address or block of addresses of trust mail
209 relays including MX servers, smart hosts, and bastion or
210 DMZ relays. The DCC clients dccm(8), dccifd(8), and
211 dccproc(8) parse and skip initial Received: headers added
212 by listed MX servers to determine the external sources of
213 mail messages. Unsolicited bulk mail that has been for-
214 warded through listed addresses is discarded by dccm(8)
215 and dccifd(8) as if with --aa _D_I_S_C_A_R_D instead of rejected.
216 _M_X_D_C_C marks addresses that are MX servers that run DCC
217 clients. The checksums for a mail message that has been
218 forwarded through an address listed as MXDCC queried
219 instead of reported.
220 _S_U_B_M_I_T _I_P_-_a_d_d_r_e_s_s_-_o_r_-_h_o_s_t_n_a_m_e
221 marks an IP address or block addresses of SMTP submission
222 clients such as web browsers that cannot tolerate 4yz
223 temporary rejections but that cannot be trusted to not
224 send spam. Since they are local addresses, DCC Reputa-
225 tions are not computed for them.
226
227 _v_a_l_u_e in _c_o_u_n_t _v_a_l_u_e lines can be
228 _d_e_s_t_-_m_a_i_l_b_o_x
229 is an RFC 821 address or a local user name.
230 _8_2_1_-_p_a_t_h
231 is an RFC 821 address.
232 _8_2_2_-_m_a_i_l_b_o_x
233 is an RFC 822 address with optional name.
234 _S_u_b_s_t_i_t_u_t_e _h_e_a_d_e_r
235 is the name of an SMTP header such as "Sender" or the
236 name of one of two SMTP envlope values, "HELO," or
237 "Mail_Host" for the resolved host name from the _8_2_1_-_p_a_t_h
238 in the message.
239 _H_e_x _c_t_y_p_e _c_k_s_u_m
240 starts with the string _H_e_x followed a checksum type, and
241 a string of four hexadecimal numbers obtained from a DCC
242 log file or the dccproc(8) command using --CCQQ. The check-
243 sum type is _b_o_d_y, _F_u_z_1, or _F_u_z_2 or one of the preceding
244 checksum types such as _e_n_v___F_r_o_m.
245 _I_P_-_a_d_d_r_e_s_s
246 is a host name, IPv4 or IPv6 address, or a block of IP
247 addresses in the standard xxx/mm from with mm limited for
248 server whitelists to 16 for IPv4 or 112 for IPv6. There
249 can be at most 64 CIDR blocks in a client _w_h_i_t_e_c_l_n_t file.
250 A host name is converted to IP addresses with DNS,
251 _/_e_t_c_/_h_o_s_t_s or other mechanisms and one checksum for each
252 addresses added to the whitelist.
253
254 _o_p_t_i_o_n _s_e_t_t_i_n_g
255 can only be in a DCC client _w_h_i_t_e_c_l_n_t file used by dccifd(8),
256 dccm(8) or dccproc(8). Settings in per-user whiteclnt files
257 override settings in the global file. _S_e_t_t_i_n_g can be any of the
258 following:
259 _o_p_t_i_o_n _l_o_g_-_a_l_l
260 to log all mail messages.
261 _o_p_t_i_o_n _l_o_g_-_n_o_r_m_a_l
262 to log only messages that meet the logging thresholds.
263 _o_p_t_i_o_n _l_o_g_-_s_u_b_d_i_r_e_c_t_o_r_y_-_d_a_y
264 _o_p_t_i_o_n _l_o_g_-_s_u_b_d_i_r_e_c_t_o_r_y_-_h_o_u_r
265 _o_p_t_i_o_n _l_o_g_-_s_u_b_d_i_r_e_c_t_o_r_y_-_m_i_n_u_t_e
266 creates log files containing mail messages in subdirecto-
267 ries of the form _J_J_J, _J_J_J_/_H_H, or _J_J_J_/_H_H_/_M_M where _J_J_J is the
268 current julian day, _H_H is the current hour, and _M_M is the
269 current minute. See also the --ll _l_o_g_d_i_r option for dccm(8),
270 dccifd(8), and dccproc(8).
271 _o_p_t_i_o_n _d_c_c_-_o_n
272 _o_p_t_i_o_n _d_c_c_-_o_f_f
273 Control DCC filtering. See the discussion of --WW for
274 dccm(8) and dccifd(8).
275 _o_p_t_i_o_n _g_r_e_y_l_i_s_t_-_o_n
276 _o_p_t_i_o_n _g_r_e_y_l_i_s_t_-_o_f_f
277 to control greylisting. Greylisting for other recipients
278 in the same SMTP transaction can still cause greylist tem-
279 porary rejections. _g_r_e_y_l_i_s_t_-_o_f_f in the main whiteclnt
280 file.
281 _o_p_t_i_o_n _g_r_e_y_l_i_s_t_-_l_o_g_-_o_n
282 _o_p_t_i_o_n _g_r_e_y_l_i_s_t_-_l_o_g_-_o_f_f
283 to control logging of greylisted mail messages.
284 _o_p_t_i_o_n _D_C_C_-_r_e_p_-_o_f_f
285 _o_p_t_i_o_n _D_C_C_-_r_e_p_-_o_n
286 to honor or ignore DCC Reputations computed by the DCC
287 server.
288 _o_p_t_i_o_n _D_N_S_B_L_1_-_o_f_f
289 _o_p_t_i_o_n _D_N_S_B_L_1_-_o_n
290 _o_p_t_i_o_n _D_N_S_B_L_2_-_o_f_f
291 _o_p_t_i_o_n _D_N_S_B_L_2_-_o_n
292 _o_p_t_i_o_n _D_N_S_B_L_3_-_o_f_f
293 _o_p_t_i_o_n _D_N_S_B_L_3_-_o_n
294 honor or ignore results of DNS blacklist checks configured
295 with --BB for dccm(8), dccifd(8), and dccproc(8).
296 _o_p_t_i_o_n _M_T_A_-_f_i_r_s_t
297 _o_p_t_i_o_n _M_T_A_-_l_a_s_t
298 consider MTA determinations of spam or not-spam first so
299 they can be overridden by _w_h_i_t_e_c_l_n_t files, or last so that
300 they can override _w_h_i_t_e_c_l_n_t _f_i_l_e_s_.
301 _o_p_t_i_o_n _f_o_r_c_e_d_-_d_i_s_c_a_r_d_-_o_k
302 _o_p_t_i_o_n _n_o_-_f_o_r_c_e_d_-_d_i_s_c_a_r_d
303 control whether dccm(8) and dccifd(8) are allowed to dis-
304 card a message for one mailbox for which it is spam when it
305 is not spam and must be delivered to another mailbox. This
306 can happen if a mail message is addressed to two or more
307 mailboxes with differing whitelists. Discarding can be
308 undesirable because false positives are not communicated to
309 mail senders. To avoid discarding, dccm(8) and dccifd(8)
310 running in proxy mode temporarily reject SMTP envelope _R_c_p_t
311 _T_o values that involve differing _w_h_i_t_e_c_l_n_t files.
312 _o_p_t_i_o_n _t_h_r_e_s_h_o_l_d _t_y_p_e_,_r_e_j_-_t_h_o_l_d
313 has the same effects as --cc _t_y_p_e_,_r_e_j_-_t_h_o_l_d for dccproc(8) or
314 --tt _t_y_p_e_,_r_e_j_-_t_h_o_l_d for dccm(8) and dccifd(8). It is useful
315 only in per-user whiteclnt files to override the global DCC
316 checksum thresholds.
317 _o_p_t_i_o_n _s_p_a_m_-_t_r_a_p_-_a_c_c_e_p_t
318 _o_p_t_i_o_n _s_p_a_m_-_t_r_a_p_-_r_e_j_e_c_t
319 say that mail should be reported to the DCC server as
320 extremely bulk or with target counts of _M_A_N_Y. Greylisting,
321 DNS blacklist (DNSBL), and other checks are turned off.
322 _S_p_a_m_-_t_r_a_p_-_a_c_c_e_p_t tells the MTA to accept the message while
323 _s_p_a_m_-_t_r_a_p_-_r_e_j_e_c_t tells the MTA to reject the message. Use
324 _S_p_a_m_-_t_r_a_p_-_a_c_c_e_p_t for spam traps that should not be dis-
325 closed. _S_p_a_m_-_t_r_a_p_-_r_e_j_e_c_t can be used on _c_a_t_c_h_-_a_l_l mail-
326 boxes that might receive legitimate mail by typographical
327 errors and that senders should be told about.
328
329 In the absence of explicit settings, the default in the main
330 whiteclnt file is equivalent to
331 _o_p_t_i_o_n _l_o_g_-_n_o_r_m_a_l
332 _o_p_t_i_o_n _d_c_c_-_o_n
333 _o_p_t_i_o_n _g_r_e_y_l_i_s_t_-_o_n
334 _o_p_t_i_o_n _g_r_e_y_l_i_s_t_-_l_o_g_-_o_n
335 _o_p_t_i_o_n _D_C_C_-_r_e_p_-_o_f_f
336 _o_p_t_i_o_n _D_N_S_B_L_1_-_o_f_f
337 _o_p_t_i_o_n _D_N_S_B_L_2_-_o_f_f
338 _o_p_t_i_o_n _D_N_S_B_L_3_-_o_f_f
339 _M_T_A_-_l_a_s_t
340 _o_p_t_i_o_n _n_o_-_f_o_r_c_e_d_-_d_i_s_c_a_r_d
341 The defaults for individual recipient _w_h_i_t_e_c_l_n_t files are the
342 same except as change by explicit settings in the main file.
343
344 Checksums of the IP address of the SMTP client sending a mail message are
345 practically unforgeable, because it is impractical for an SMTP client to
346 "spoof" its address or pretend to use some other IP address. That would
347 make the IP address of the sender useful for whitelisting, except that
348 the IP address of the SMTP client is often not available to users of
349 dccproc(8). In addition, legitimate mail relays make whitelist entries
350 for IP addresses of little use. For example, the IP address from which a
351 message arrived might be that of a local relay instead of the home
352 address of a whitelisted mailing list.
353
354 Envelope and header _F_r_o_m values can be forged, so whitelist entries for
355 their checksums are not entirely reliable.
356
357 Checksums of _e_n_v___T_o values are never sent to DCC servers. They are valid
358 in only _w_h_i_t_e_c_l_n_t files and used only by dccm(8), dccifd(8), and
359 dccproc(8) when the envelope _R_c_p_t _T_o value is known.
360
361 GGrreeyylliissttss
362 The DCC server, dccd(8), can be used to maintain a greylist database for
363 some DCC clients including dccm(8) and dccifd(8). Greylisting involves
364 temporarily refusing mail from unfamiliar SMTP clients and is unrelated
365 to filtering with a Distributed Checksum Clearinghouse.
366 See http://projects.puremagic.com/greylisting/
367
368 PPrriivvaaccyy
369 Because sending mail is a less private act than receiving it, and because
370 sending bulk mail is usually not private at all and cannot be very pri-
371 vate, the DCC tries first to protect the privacy of mail recipients, and
372 second the privacy of senders of mail that is not bulk.
373
374 DCC clients necessarily disclose some information about mail they have
375 received. The DCC database contains checksums of mail bodies, header
376 lines, and source addresses. While it contains significantly less infor-
377 mation than is available by "snooping" on Internet links, it is important
378 that the DCC database be treated as containing sensitive information and
379 to not put the most private information in the DCC database. Given the
380 contents of a message, one might determine whether that message has been
381 received by a system that subscribes to the DCC. Guesses about the
382 sender and addressee of a message can also be validated if the checksums
383 of the message have been sent to a DCC server.
384
385 Because the DCC is distributed, organizations can operate their own DCC
386 servers, and configure them to share or "flood" only the checksums of
387 bulk mail that is not in local whitelists.
388
389 DCC clients should not report the checksums of messages known to be pri-
390 vate to a DCC server. For example, checksums of messages local to a sys-
391 tem or that are otherwise known a priori to not be unsolicited bulk
392 should not be sent to a remote DCC server. This can accomplished by
393 adding entries for the sender to the client's local whitelist file.
394 Client whitelist files can also include entries for email recipients
395 whose mail should not be reported to a DCC server.
396
397 SSeeccuurriittyy
398 Whenever considering security, one must first consider the risks. The
399 worst DCC security problems are unauthorized commands to a DCC service,
400 denial of the DCC service, and corruption of DCC data. The worst that
401 can be done with remote commands to a DCC server is to turn it off or
402 otherwise cause it to stop responding. The DCC is designed to fail
403 gracefully, so that a denial of service attack would at worst allow
404 delivery of mail that would otherwise be rejected. Corruption of DCC
405 data might at worst cause mail that is already somewhat "bulk" by virtue
406 of being received by two or more people to appear have higher recipient
407 numbers. Since DCC users _m_u_s_t whitelist all sources of legitimate bulk
408 mail, this is also not a concern. Such security risks should be
409 addressed, but only with defenses that don't cost more than the possible
410 damage from an attack.
411
412 The DCC must contend with senders of unsolicited bulk mail who resort to
413 unlawful actions to express their displeasure at having their advertising
414 blocked. Because the DCC protocol is based on UDP, an unhappy advertiser
415 could try to flood a DCC server with packets supposedly from subscribers
416 or non-subscribers. DCC servers defend against that attack by rate-lim-
417 iting requests from anonymous users.
418
419 Also because of the use of UDP, clients must be protected against forged
420 answers to their queries. Otherwise an unsolicited bulk mail advertiser
421 could send a stream of "not spam" answers to an SMTP client while simul-
422 taneously sending mail that would otherwise be rejected. This is not a
423 problem for authenticated clients of the DCC because they share a secret
424 with the DCC. Unauthenticated, anonymous DCC clients do not share any
425 secrets with the DCC, except for unique and unpredictable bits in each
426 query or report sent to the DCC. Therefore, DCC servers cryptographi-
427 cally sign answers to unauthenticated clients with bits from the corre-
428 sponding queries. This protects against attackers that do not have
429 access to the stream of packets from the DCC client.
430
431 The passwords or shared secrets used in the DCC client and server pro-
432 grams are "cleartext" for several reasons. In any shared secret authen-
433 tication system, at least one party must know the secret or keep the
434 secret in cleartext. You could encrypt the secrets in a file, but
435 because they are used by programs, you would need a cleartext copy of the
436 key to decrypt the file somewhere in the system, making such a scheme
437 more expensive but no more secure than a file of cleartext passwords.
438 Asymmetric systems such as that used in UNIX allow one party to not know
439 the secrets, but they must be and are designed to be computationally
440 expensive when used in applications like the DCC that involve thousands
441 or more authentication checks per second. Moreover, because of "dictio-
442 nary attacks," asymmetric systems are now little more secure than keeping
443 passwords in cleartext. An adversary can compare the hash values of com-
444 binations of common words with /etc/passwd hash values to look for bad
445 passwords. Worse, by the nature of a client/server protocol like that
446 used in the DCC, clients must have the cleartext password. Since it is
447 among the more numerous and much less secure clients that adversaries
448 would seek files of DCC passwords, it would be a waste to complicate the
449 DCC server with an asymmetric system.
450
451 The DCC protocol is vulnerable to dictionary attacks to recover pass-
452 words. An adversary could capture some DCC packets, and then check to
453 see if any of the 100,000 to 1,000,000 passwords in so called "cracker
454 dictionaries" applied to a packet generated the same signature. This is
455 a concern only if DCC passwords are poorly chosen, such as any combina-
456 tion of words in an English dictionary. There are ways to prevent this
457 vulnerability regardless of how badly passwords are chosen, but they are
458 computationally expensive and require additional network round trips.
459 Since DCC passwords are created and typed into files once and do not need
460 to be remembered by people, it is cheaper and quite easy to simply choose
461 good passwords that are not in dictionaries.
462
463 RReelliiaabbiilliittyy
464 It is better to fail to filter unsolicited bulk mail than to fail to
465 deliver legitimate mail, so DCC clients fail in the direction of assuming
466 that mail is legitimate or even whitelisted.
467
468 A DCC client sends a report or other request and waits for an answer. If
469 no answer arrives within a reasonable time, the client retransmits.
470 There are many things that might result in the client not receiving an
471 answer, but the most important is packet loss. If the client's request
472 does not reach the server, it is easy and harmless for the client to
473 retransmit. If the client's request reached the server but the server's
474 response was lost, a retransmission to the same server would be misunder-
475 stood as a new report of another copy of the same message unless it is
476 detected as a retransmission by the server. The DCC protocol includes
477 transactions identifiers for this purpose. If the client retransmitted
478 to a second server, the retransmission would be misunderstood by the sec-
479 ond server as a new report of the same message.
480
481 Each request from a client includes a timestamp to aid the client in mea-
482 suring the round trip time to the server and to let the client pick the
483 closest server. Clients monitor the speed of all of the servers they
484 know including those they are not currently using, and use the quickest.
485
486 CClliieenntt aanndd SSeerrvveerr--IIDDss
487 Servers and clients use numbers or IDs to identify themselves. ID 1 is
488 reserved for anonymous, unauthenticated clients. All other IDs are asso-
489 ciated with a pair of passwords in the _i_d_s file, the current and next or
490 previous and current passwords. Clients included their client IDs in
491 their messages. When they are not using the anonymous ID, they sign
492 their messages to servers with the first password associated with their
493 client-ID. Servers treat messages with signatures that match neither of
494 the passwords for the client-ID in their own _i_d_s file as if the client
495 had used the anonymous ID.
496
497 Each server has a unique _s_e_r_v_e_r_-_I_D less than 32768. Servers use their
498 IDs to identify checksums that they _f_l_o_o_d to other servers. Each server
499 expects local clients sending administrative commands to use the server's
500 ID and sign administrative commands with the associated password.
501
502 Server-IDs must be unique among all systems that share reports by "flood-
503 ing." All servers must be told of the IDs all other servers whose
504 reports can be received in the local _/_v_a_r_/_d_c_c_/_f_l_o_d file described in
505 dccd(8). However, server-IDs can be mapped during flooding between inde-
506 pendent DCC organizations.
507
508 _P_a_s_s_w_d_-_I_D_s are server-IDs that should not be assigned to servers. They
509 appear in the often publicly readable _/_v_a_r_/_d_c_c_/_f_l_o_d and specify passwords
510 in the private _/_v_a_r_/_d_c_c_/_i_d_s file for the inter-server flooding protocol
511
512 The client identified by a _c_l_i_e_n_t_-_I_D might be a single computer with a
513 single IP address, a single but multi-homed computer, or many computers.
514 Client-IDs are not used to identify checksum reports, but the organiza-
515 tion operating the client. A client-ID need only be unique among clients
516 using a single server. A single client can use different client-IDs for
517 different servers, each client-ID authenticated with a separate password.
518
519 An obscure but important part of all of this is that the inter-server
520 flooding algorithm depends on server-IDs and timestamps attached to
521 reports of checksums. The inter-server flooding mechanism requires coop-
522 erating DCC servers to maintain reasonable clocks ticking in UTC.
523 Clients include timestamps in their requests, but as long as their time-
524 stamps are unlikely to be repeated, they need not be very accurate.
525
526 IInnssttaallllaattiioonn CCoonnssiiddeerraattiioonnss
527 DCC clients on a computer share information about which servers are cur-
528 rently working and their speeds in a shared memory segment. This segment
529 also contains server host names, IP addresses, and the passwords needed
530 to authenticate known clients to servers. That generally requires that
531 dccm(8), dccproc(8), dccifd(8), and cdcc(8) execute with an UID that can
532 write to the DCC home directory and its files. The sendmail interface,
533 dccm, is a daemon that can be started by an "rc" or other script already
534 running with the correct UID. The other two, dccproc and cdcc need to be
535 set-UID because they are used by end users. They relinquish set-UID
536 privileges when not needed.
537
538 Files that contain cleartext passwords including the shared file used by
539 clients must be readable only by "owner."
540
541 The data files required by a DCC can be in a single "home" directory,
542 _/_v_a_r_/_d_c_c. Distinct DCC servers can run on a single computer, provided
543 they use distinct UDP port numbers and home directories. It is possible
544 and convenient for the DCC clients using a server on the same computer to
545 use the same home directory as the server.
546
547 The DCC source distribution includes sample control files. They should
548 be modified appropriately and then copied to the DCC home directory.
549 Files that contain cleartext passwords must not be publicly readable.
550
551 The DCC source includes "feature" m4 files to configure sendmail to use
552 dccm(8) to check a DCC server about incoming mail.
553
554 See also the INSTALL.html file.
555
556 CClliieenntt IInnssttaallllaattiioonn
557 Installing a DCC client starts with obtaining or compiling program bina-
558 ries for the client server data control tool, cdcc(8). Installing the
559 sendmail DCC interface, dccm(8), or dccproc(8), the general or
560 procmail(1) interface is the main part of the client installation. Con-
561 necting the DCC to sendmail with dccm is most powerful, but requires
562 administrative control of the system running sendmail.
563
564 As noted above, cdcc and dccproc should be set-UID to a suitable UID.
565 Root or 0 is thought to be safe for both, because they are careful to
566 release privileges except when they need them to read or write files in
567 the DCC home directory. A DCC home directory, _/_v_a_r_/_d_c_c should be cre-
568 ated. It must be owned and writable by the UID to which cdcc is set.
569
570 After the DCC client programs have been obtained, contact the operator(s)
571 of the chosen DCC server(s) to obtain each server's hostname, port num-
572 ber, and a _c_l_i_e_n_t_-_I_D and corresponding password. No client-IDs or pass-
573 words are needed touse DCC servers that allow anonymous clients. Use the
574 _l_o_a_d or _a_d_d commands of cdcc to create a _m_a_p file in the DCC home direc-
575 tory. It is usually necessary to create a client whitelist file of the
576 format described above. To accommodate users sharing a computer but not
577 ideas about what is solicited bulk mail, the client whitelist file can be
578 any valid path name and need not be in the DCC home directory.
579
580 If dccm is chosen, arrange to start it with suitable arguments before
581 sendmail is started. See the _h_o_m_e_d_i_r_/_d_c_c___c_o_n_f file and the _m_i_s_c_/_r_c_D_C_C
582 script in the DCC source. The procmail DCCM interface, dccproc(8), can
583 be run manually or by a procmailrc(5) rule.
584
585 SSeerrvveerr IInnssttaallllaattiioonn
586 The DCC server, dccd(8), also requires that the DCC home directory exist.
587 It does not use the client shared or memory mapped file of server
588 addresses, but it requires other files. One is the _/_v_a_r_/_d_c_c_/_i_d_s file of
589 client-IDs, server-IDs, and corresponding passwords. Another is a _f_l_o_d
590 file of peers that send and receive floods of reports of checksums with
591 large counts. Both files are described in dccd(8).
592
593 The server daemon should be started when the system is rebooted, probably
594 before sendmail. See the _m_i_s_c_/_r_c_D_C_C and _m_i_s_c_/_s_t_a_r_t_-_d_c_c_d files in the DCC
595 source.
596
597 The database should be cleaned regularly with dbclean(8) such as by run-
598 ning the crontab job that is in the misc directory.
599
600 SSEEEE AALLSSOO
601 cdcc(8), dbclean(8), dcc(8), dccd(8), dccifd(8), dccm(8), dccproc(8),
602 dblist(8), dccsight(8), sendmail(8).
603
604 HHIISSTTOORRYY
605 Distributed Checksum Clearinghouses are based on an idea of Paul Vixie
606 with code designed and written at Rhyolite Software starting in 2000.
607 This document describes version 1.3.103.
608
609 February 26, 2009