Mercurial > notdcc
comparison dcc.0 @ 0:c7f6b056b673
First import of vendor version
author | Peter Gervai <grin@grin.hu> |
---|---|
date | Tue, 10 Mar 2009 13:49:58 +0100 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:c7f6b056b673 |
---|---|
1 DCC(8) Distributed Checksum Clearinghouse DCC(8) | |
2 | |
3 NNAAMMEE | |
4 DDCCCC -- Distributed Checksum Clearinghouse | |
5 | |
6 DDEESSCCRRIIPPTTIIOONN | |
7 The Distributed Checksum Clearinghouse or DDCCCC is a cooperative, distrib- | |
8 uted system intended to detect "bulk" mail or mail sent to many people. | |
9 It allows individuals receiving a single mail message to determine that | |
10 many other people have received essentially identical copies of the mes- | |
11 sage and so reject or discard the message. | |
12 | |
13 Source for the server, client, and utilities is available at Rhyolite | |
14 Software, LLC, http://www.rhyolite.com/dcc/ It is free for organizations | |
15 that do not sell spam or virus filtering services. | |
16 | |
17 HHooww tthhee DDCCCC IIss UUsseedd | |
18 The DCC can be viewed as a tool for end users to enforce their right to | |
19 "opt-in" to streams of bulk mail by refusing bulk mail except from | |
20 sources in a "whitelist." Whitelists are the responsibility of DCC | |
21 clients, since only they know which bulk mail they solicited. | |
22 | |
23 False positives or mail marked as bulk by a DCC server that is not bulk | |
24 occur only when a recipient of a message reports it to a DCC server as | |
25 having been received many times or when the "fuzzy" checksums of differ- | |
26 ing messages are the same. The fuzzy checksums ignore aspects of mes- | |
27 sages in order to compute identical checksums for substantially identical | |
28 messages. The fuzzy checksums are designed to ignore only differences | |
29 that do not affect meanings. So in practice, you do not need to worry | |
30 about DCC false positive indications of "bulk," but not all bulk mail is | |
31 unsolicited bulk mail or spam. You must either use whitelists to distin- | |
32 guish solicited from unsolicited bulk mail or only use DCC indications of | |
33 "bulk" as part of a scoring system such as SpamAssassin. Besides unso- | |
34 licited bulk email or spam, bulk messages include legitimate mail such as | |
35 order confirmations from merchants, legitimate mailing lists, and empty | |
36 or test messages. | |
37 | |
38 A DCC server estimates the number copies of a message by counting check- | |
39 sums reported by DCC clients. Each client must decide which bulk mes- | |
40 sages are unsolicited and what degree of "bulkiness" is objectionable. | |
41 Client DCC software marks, rejects, or discards mail that is bulk accord- | |
42 ing to local thresholds on target addresses from DCC servers and unso- | |
43 licited according to local whitelists. | |
44 | |
45 DCC servers are usually configured to receive reports from as many tar- | |
46 gets as possible, including sources that cannot be trusted to not exag- | |
47 gerate the number of copies of a message they see. A user of a DCC | |
48 client angry about receiving a message could report it with 1,000,000 | |
49 separate DCC reports or with a single report claiming 1,000,000 targets. | |
50 An unprincipled user could subscribe a "spam trap" to mailing lists such | |
51 as those of the IETF or CERT. Such abuses of the system area not prob- | |
52 lems, because much legitimate mail is "bulk." You cannot reject bulk | |
53 mail unless you have a whitelist of sources of legitimate bulk mail. | |
54 | |
55 DCC can also be used by an Internet service provider to detect bulk mail | |
56 coming from its own customers. In such circumstances, the DCC client | |
57 might be configured to only log bulk mail from unexpected (not | |
58 whitelisted) customers. | |
59 | |
60 WWhhaatt tthhee DDCCCC IIss | |
61 A DCC server accumulates counts of cryptographic checksums of messages | |
62 but not the messages themselves. It exchanges reports of frequently seen | |
63 checksums with other servers. DCC clients send reports of checksums | |
64 related to incoming mail to a nearby DCC server running dccd(8). Each | |
65 report from a client includes the number of recipients for the message. | |
66 A DCC server accumulates the reports and responds to clients the the cur- | |
67 rent total number of recipients for each checksum. The client adds an | |
68 SMTP header to incoming mail containing the total counts. It then dis- | |
69 cards or rejects mail that is not whitelisted and has counts that exceed | |
70 local thresholds. | |
71 | |
72 A special value of the number of addressees is "MANY" and means it is | |
73 certain that this message was bulk and might be unsolicited, perhaps | |
74 because it came from a locally blacklisted source or was addressed to an | |
75 invalid address or "spam trap." The special value "MANY" is merely the | |
76 largest value that fits in the fixed sized field containing the count of | |
77 addressees. That "infinity" accumulated total can be reached with mil- | |
78 lions of independent reports as well as with one or two. | |
79 | |
80 DCC servers _f_l_o_o_d or send reports of checksums of bulk mail to neighbor- | |
81 ing servers. | |
82 | |
83 To keep a server's database of checksums from growing without bound, | |
84 checksums are forgotten when they become old. Checksums of bulk mail are | |
85 kept longer. See dbclean(8). | |
86 | |
87 DCC clients pick the nearest working DCC server using a small shared or | |
88 memory mapped file, _/_v_a_r_/_d_c_c_/_m_a_p. It contains server names, port num- | |
89 bers, passwords, recent performance measures, and so forth. This file | |
90 allows clients to use quick retransmission timeouts and to waste little | |
91 time on servers that have temporarily stopped working or become unreach- | |
92 able. The utility program cdcc(8) is used to maintain this file as well | |
93 as to check the health of servers. | |
94 | |
95 XX--DDCCCC HHeeaaddeerrss | |
96 The DCC software includes several programs used by clients. Dccm(8) uses | |
97 the sendmail "milter" interface to query a DCC server, add header lines | |
98 to incoming mail, and reject mail whose total checksum counts are high. | |
99 Dccm is intended to be run with SMTP servers using sendmail. | |
100 | |
101 Dccproc(8) adds header lines to mail presented by file name or _s_t_d_i_n, but | |
102 relies on other programs such as procmail to deal with mail with large | |
103 counts. Dccsight(8) is similar but deals with previously computed check- | |
104 sums. | |
105 | |
106 Dccifd(8) is similar to dccproc but is not run separately for each mail | |
107 message and so is far more efficient. It receives mail messages via a | |
108 socket somewhat like dccm, but with a simpler protocol that can be used | |
109 by Perl scripts or other programs. | |
110 | |
111 DCC SMTP header lines are of one of the forms: | |
112 | |
113 X-DCC-brand-Metrics: client server-ID; bulk cknm1=count cknm2=count ... | |
114 X-DCC-brand-Metrics: client; whitelist | |
115 where | |
116 _w_h_i_t_e_l_i_s_t appears if the global or per-user _w_h_i_t_e_c_l_n_t file marks the | |
117 message as good. | |
118 _b_r_a_n_d is the "brand name" of the DCC server, such as "RHYOLITE". | |
119 _c_l_i_e_n_t is the name or IP address of the DCC client that added the | |
120 header line to the SMTP message. | |
121 _s_e_r_v_e_r_-_I_D is the numeric ID of the DCC server that the DCC client con- | |
122 tacted. | |
123 _b_u_l_k is present if one or more checksum counts exceeded the DCC | |
124 client's thresholds to make the message "bulky." | |
125 _b_u_l_k _r_e_p is present if the DCC reputation of the IP address of the | |
126 sender is bad. | |
127 _c_k_n_m_1,_c_k_n_m_2,... are types of checksums: | |
128 _I_P address of SMTP client | |
129 _e_n_v___F_r_o_m SMTP envelope value | |
130 _F_r_o_m SMTP header line | |
131 _M_e_s_s_a_g_e_-_I_D SMTP header line | |
132 _R_e_c_e_i_v_e_d last Received: header line in the SMTP message | |
133 _s_u_b_s_t_i_t_u_t_e SMTP header line chosen by the DCC client, pre- | |
134 fixed with the name of the header | |
135 _B_o_d_y SMTP body ignoring white-space | |
136 _F_u_z_1 filtered or "fuzzy" body checksum | |
137 _F_u_z_2 another filtered or "fuzzy" body checksum | |
138 _r_e_p DCC reputation of the mail sender or the esti- | |
139 mated probability that the message is bulk. | |
140 Counts for _I_P, _e_n_v___F_r_o_m, _F_r_o_m, _M_e_s_s_a_g_e_-_I_d, _R_e_c_e_i_v_e_d, and | |
141 _s_u_b_s_t_i_t_u_t_e checksums are omitted by the DCC client if the | |
142 server says it has no information. Counts for _F_u_z_1 and _F_u_z_2 | |
143 are omitted if the message body is empty or contains too lit- | |
144 tle of the right kind of information for the checksum to be | |
145 computed. | |
146 _c_o_u_n_t is the total number of recipients of messages with that check- | |
147 sum reported directly or indirectly to the DCC server. The | |
148 special count "MANY" means that DCC client have claimed that | |
149 the message is directed at millions of recipients. "MANY" | |
150 imples the message is definitely bulk, but not necessarily | |
151 unsolicited. The special counts "OK" and "OK2" mean the | |
152 checksum has been marked "good" or "half-good" by DCC servers. | |
153 | |
154 MMaaiilliinngg lliissttss | |
155 Legitimate mailing list traffic differs from spam only in being solicited | |
156 by recipients. Each client should have a private whitelist. | |
157 | |
158 DCC whitelists can also mark mail as unsolicited bulk using blacklist | |
159 entries for commonly forged values such as "From: user@public.com". | |
160 | |
161 WWhhiittee aanndd BBllaacckklliissttss | |
162 DCC server and client whitelist files share a common format. Server | |
163 files are always named _w_h_i_t_e_l_i_s_t and one is required to be in the DCC | |
164 home directory with the other server files. Client whitelist files are | |
165 named _w_h_i_t_e_c_l_n_t in the DCC home directory or a subdirectory specified | |
166 with the --UU option for dccm(8). They specify mail that should not be | |
167 reported to a DCC server or that is always unsolicited and almost cer- | |
168 tainly bulk. | |
169 | |
170 A DCC whitelist file contains blank lines, comments starting with "#", | |
171 and lines of the following forms: | |
172 _i_n_c_l_u_d_e _f_i_l_e | |
173 Copies the contents of _f_i_l_e into the whitelist. It can occur | |
174 only in the main whitelist or whiteclnt file and not in an | |
175 included file. The file name should be absolute or relative to | |
176 the DCC home directory. | |
177 | |
178 _c_o_u_n_t _v_a_l_u_e | |
179 lines specify checksums that should be white- or blacklisted. | |
180 _c_o_u_n_t _e_n_v___F_r_o_m _8_2_1_-_p_a_t_h | |
181 _c_o_u_n_t _e_n_v___T_o _d_e_s_t_-_m_a_i_l_b_o_x | |
182 _c_o_u_n_t _F_r_o_m _8_2_2_-_m_a_i_l_b_o_x | |
183 _c_o_u_n_t _M_e_s_s_a_g_e_-_I_D _<_s_t_r_i_n_g_> | |
184 _c_o_u_n_t _R_e_c_e_i_v_e_d _s_t_r_i_n_g | |
185 _c_o_u_n_t _S_u_b_s_t_i_t_u_t_e _h_e_a_d_e_r _s_t_r_i_n_g | |
186 _c_o_u_n_t _H_e_x _c_t_y_p_e _c_k_s_u_m | |
187 _c_o_u_n_t _i_p _I_P_-_a_d_d_r_e_s_s | |
188 | |
189 _M_A_N_Y _v_a_l_u_e | |
190 indicates that millions of targets have received messages | |
191 with the header, IP address, or checksum _v_a_l_u_e. | |
192 _O_K _v_a_l_u_e | |
193 _O_K_2 _v_a_l_u_e | |
194 say that messages with the header, IP address, or check- | |
195 sum _v_a_l_u_e are OK and should not reported to DCC servers | |
196 or be greylisted. _O_K_2 says that the message is "half | |
197 OK." Two _O_K_2 checksums associated with a message are | |
198 equivalent to one _O_K. | |
199 A DCC server never shares or _f_l_o_o_d_s reports containing | |
200 checksums marked in its whitelist with OK or OK2 to other | |
201 servers. A DCC client does not report or ask its server | |
202 about messages with a checksum marked OK or OK2 in the | |
203 client whitelist. This is intended to allow a DCC client | |
204 to keep private mail so private that even its checksums | |
205 are not disclosed. | |
206 _M_X _I_P_-_a_d_d_r_e_s_s_-_o_r_-_h_o_s_t_n_a_m_e | |
207 _M_X_D_C_C _I_P_-_a_d_d_r_e_s_s_-_o_r_-_h_o_s_t_n_a_m_e | |
208 mark an address or block of addresses of trust mail | |
209 relays including MX servers, smart hosts, and bastion or | |
210 DMZ relays. The DCC clients dccm(8), dccifd(8), and | |
211 dccproc(8) parse and skip initial Received: headers added | |
212 by listed MX servers to determine the external sources of | |
213 mail messages. Unsolicited bulk mail that has been for- | |
214 warded through listed addresses is discarded by dccm(8) | |
215 and dccifd(8) as if with --aa _D_I_S_C_A_R_D instead of rejected. | |
216 _M_X_D_C_C marks addresses that are MX servers that run DCC | |
217 clients. The checksums for a mail message that has been | |
218 forwarded through an address listed as MXDCC queried | |
219 instead of reported. | |
220 _S_U_B_M_I_T _I_P_-_a_d_d_r_e_s_s_-_o_r_-_h_o_s_t_n_a_m_e | |
221 marks an IP address or block addresses of SMTP submission | |
222 clients such as web browsers that cannot tolerate 4yz | |
223 temporary rejections but that cannot be trusted to not | |
224 send spam. Since they are local addresses, DCC Reputa- | |
225 tions are not computed for them. | |
226 | |
227 _v_a_l_u_e in _c_o_u_n_t _v_a_l_u_e lines can be | |
228 _d_e_s_t_-_m_a_i_l_b_o_x | |
229 is an RFC 821 address or a local user name. | |
230 _8_2_1_-_p_a_t_h | |
231 is an RFC 821 address. | |
232 _8_2_2_-_m_a_i_l_b_o_x | |
233 is an RFC 822 address with optional name. | |
234 _S_u_b_s_t_i_t_u_t_e _h_e_a_d_e_r | |
235 is the name of an SMTP header such as "Sender" or the | |
236 name of one of two SMTP envlope values, "HELO," or | |
237 "Mail_Host" for the resolved host name from the _8_2_1_-_p_a_t_h | |
238 in the message. | |
239 _H_e_x _c_t_y_p_e _c_k_s_u_m | |
240 starts with the string _H_e_x followed a checksum type, and | |
241 a string of four hexadecimal numbers obtained from a DCC | |
242 log file or the dccproc(8) command using --CCQQ. The check- | |
243 sum type is _b_o_d_y, _F_u_z_1, or _F_u_z_2 or one of the preceding | |
244 checksum types such as _e_n_v___F_r_o_m. | |
245 _I_P_-_a_d_d_r_e_s_s | |
246 is a host name, IPv4 or IPv6 address, or a block of IP | |
247 addresses in the standard xxx/mm from with mm limited for | |
248 server whitelists to 16 for IPv4 or 112 for IPv6. There | |
249 can be at most 64 CIDR blocks in a client _w_h_i_t_e_c_l_n_t file. | |
250 A host name is converted to IP addresses with DNS, | |
251 _/_e_t_c_/_h_o_s_t_s or other mechanisms and one checksum for each | |
252 addresses added to the whitelist. | |
253 | |
254 _o_p_t_i_o_n _s_e_t_t_i_n_g | |
255 can only be in a DCC client _w_h_i_t_e_c_l_n_t file used by dccifd(8), | |
256 dccm(8) or dccproc(8). Settings in per-user whiteclnt files | |
257 override settings in the global file. _S_e_t_t_i_n_g can be any of the | |
258 following: | |
259 _o_p_t_i_o_n _l_o_g_-_a_l_l | |
260 to log all mail messages. | |
261 _o_p_t_i_o_n _l_o_g_-_n_o_r_m_a_l | |
262 to log only messages that meet the logging thresholds. | |
263 _o_p_t_i_o_n _l_o_g_-_s_u_b_d_i_r_e_c_t_o_r_y_-_d_a_y | |
264 _o_p_t_i_o_n _l_o_g_-_s_u_b_d_i_r_e_c_t_o_r_y_-_h_o_u_r | |
265 _o_p_t_i_o_n _l_o_g_-_s_u_b_d_i_r_e_c_t_o_r_y_-_m_i_n_u_t_e | |
266 creates log files containing mail messages in subdirecto- | |
267 ries of the form _J_J_J, _J_J_J_/_H_H, or _J_J_J_/_H_H_/_M_M where _J_J_J is the | |
268 current julian day, _H_H is the current hour, and _M_M is the | |
269 current minute. See also the --ll _l_o_g_d_i_r option for dccm(8), | |
270 dccifd(8), and dccproc(8). | |
271 _o_p_t_i_o_n _d_c_c_-_o_n | |
272 _o_p_t_i_o_n _d_c_c_-_o_f_f | |
273 Control DCC filtering. See the discussion of --WW for | |
274 dccm(8) and dccifd(8). | |
275 _o_p_t_i_o_n _g_r_e_y_l_i_s_t_-_o_n | |
276 _o_p_t_i_o_n _g_r_e_y_l_i_s_t_-_o_f_f | |
277 to control greylisting. Greylisting for other recipients | |
278 in the same SMTP transaction can still cause greylist tem- | |
279 porary rejections. _g_r_e_y_l_i_s_t_-_o_f_f in the main whiteclnt | |
280 file. | |
281 _o_p_t_i_o_n _g_r_e_y_l_i_s_t_-_l_o_g_-_o_n | |
282 _o_p_t_i_o_n _g_r_e_y_l_i_s_t_-_l_o_g_-_o_f_f | |
283 to control logging of greylisted mail messages. | |
284 _o_p_t_i_o_n _D_C_C_-_r_e_p_-_o_f_f | |
285 _o_p_t_i_o_n _D_C_C_-_r_e_p_-_o_n | |
286 to honor or ignore DCC Reputations computed by the DCC | |
287 server. | |
288 _o_p_t_i_o_n _D_N_S_B_L_1_-_o_f_f | |
289 _o_p_t_i_o_n _D_N_S_B_L_1_-_o_n | |
290 _o_p_t_i_o_n _D_N_S_B_L_2_-_o_f_f | |
291 _o_p_t_i_o_n _D_N_S_B_L_2_-_o_n | |
292 _o_p_t_i_o_n _D_N_S_B_L_3_-_o_f_f | |
293 _o_p_t_i_o_n _D_N_S_B_L_3_-_o_n | |
294 honor or ignore results of DNS blacklist checks configured | |
295 with --BB for dccm(8), dccifd(8), and dccproc(8). | |
296 _o_p_t_i_o_n _M_T_A_-_f_i_r_s_t | |
297 _o_p_t_i_o_n _M_T_A_-_l_a_s_t | |
298 consider MTA determinations of spam or not-spam first so | |
299 they can be overridden by _w_h_i_t_e_c_l_n_t files, or last so that | |
300 they can override _w_h_i_t_e_c_l_n_t _f_i_l_e_s_. | |
301 _o_p_t_i_o_n _f_o_r_c_e_d_-_d_i_s_c_a_r_d_-_o_k | |
302 _o_p_t_i_o_n _n_o_-_f_o_r_c_e_d_-_d_i_s_c_a_r_d | |
303 control whether dccm(8) and dccifd(8) are allowed to dis- | |
304 card a message for one mailbox for which it is spam when it | |
305 is not spam and must be delivered to another mailbox. This | |
306 can happen if a mail message is addressed to two or more | |
307 mailboxes with differing whitelists. Discarding can be | |
308 undesirable because false positives are not communicated to | |
309 mail senders. To avoid discarding, dccm(8) and dccifd(8) | |
310 running in proxy mode temporarily reject SMTP envelope _R_c_p_t | |
311 _T_o values that involve differing _w_h_i_t_e_c_l_n_t files. | |
312 _o_p_t_i_o_n _t_h_r_e_s_h_o_l_d _t_y_p_e_,_r_e_j_-_t_h_o_l_d | |
313 has the same effects as --cc _t_y_p_e_,_r_e_j_-_t_h_o_l_d for dccproc(8) or | |
314 --tt _t_y_p_e_,_r_e_j_-_t_h_o_l_d for dccm(8) and dccifd(8). It is useful | |
315 only in per-user whiteclnt files to override the global DCC | |
316 checksum thresholds. | |
317 _o_p_t_i_o_n _s_p_a_m_-_t_r_a_p_-_a_c_c_e_p_t | |
318 _o_p_t_i_o_n _s_p_a_m_-_t_r_a_p_-_r_e_j_e_c_t | |
319 say that mail should be reported to the DCC server as | |
320 extremely bulk or with target counts of _M_A_N_Y. Greylisting, | |
321 DNS blacklist (DNSBL), and other checks are turned off. | |
322 _S_p_a_m_-_t_r_a_p_-_a_c_c_e_p_t tells the MTA to accept the message while | |
323 _s_p_a_m_-_t_r_a_p_-_r_e_j_e_c_t tells the MTA to reject the message. Use | |
324 _S_p_a_m_-_t_r_a_p_-_a_c_c_e_p_t for spam traps that should not be dis- | |
325 closed. _S_p_a_m_-_t_r_a_p_-_r_e_j_e_c_t can be used on _c_a_t_c_h_-_a_l_l mail- | |
326 boxes that might receive legitimate mail by typographical | |
327 errors and that senders should be told about. | |
328 | |
329 In the absence of explicit settings, the default in the main | |
330 whiteclnt file is equivalent to | |
331 _o_p_t_i_o_n _l_o_g_-_n_o_r_m_a_l | |
332 _o_p_t_i_o_n _d_c_c_-_o_n | |
333 _o_p_t_i_o_n _g_r_e_y_l_i_s_t_-_o_n | |
334 _o_p_t_i_o_n _g_r_e_y_l_i_s_t_-_l_o_g_-_o_n | |
335 _o_p_t_i_o_n _D_C_C_-_r_e_p_-_o_f_f | |
336 _o_p_t_i_o_n _D_N_S_B_L_1_-_o_f_f | |
337 _o_p_t_i_o_n _D_N_S_B_L_2_-_o_f_f | |
338 _o_p_t_i_o_n _D_N_S_B_L_3_-_o_f_f | |
339 _M_T_A_-_l_a_s_t | |
340 _o_p_t_i_o_n _n_o_-_f_o_r_c_e_d_-_d_i_s_c_a_r_d | |
341 The defaults for individual recipient _w_h_i_t_e_c_l_n_t files are the | |
342 same except as change by explicit settings in the main file. | |
343 | |
344 Checksums of the IP address of the SMTP client sending a mail message are | |
345 practically unforgeable, because it is impractical for an SMTP client to | |
346 "spoof" its address or pretend to use some other IP address. That would | |
347 make the IP address of the sender useful for whitelisting, except that | |
348 the IP address of the SMTP client is often not available to users of | |
349 dccproc(8). In addition, legitimate mail relays make whitelist entries | |
350 for IP addresses of little use. For example, the IP address from which a | |
351 message arrived might be that of a local relay instead of the home | |
352 address of a whitelisted mailing list. | |
353 | |
354 Envelope and header _F_r_o_m values can be forged, so whitelist entries for | |
355 their checksums are not entirely reliable. | |
356 | |
357 Checksums of _e_n_v___T_o values are never sent to DCC servers. They are valid | |
358 in only _w_h_i_t_e_c_l_n_t files and used only by dccm(8), dccifd(8), and | |
359 dccproc(8) when the envelope _R_c_p_t _T_o value is known. | |
360 | |
361 GGrreeyylliissttss | |
362 The DCC server, dccd(8), can be used to maintain a greylist database for | |
363 some DCC clients including dccm(8) and dccifd(8). Greylisting involves | |
364 temporarily refusing mail from unfamiliar SMTP clients and is unrelated | |
365 to filtering with a Distributed Checksum Clearinghouse. | |
366 See http://projects.puremagic.com/greylisting/ | |
367 | |
368 PPrriivvaaccyy | |
369 Because sending mail is a less private act than receiving it, and because | |
370 sending bulk mail is usually not private at all and cannot be very pri- | |
371 vate, the DCC tries first to protect the privacy of mail recipients, and | |
372 second the privacy of senders of mail that is not bulk. | |
373 | |
374 DCC clients necessarily disclose some information about mail they have | |
375 received. The DCC database contains checksums of mail bodies, header | |
376 lines, and source addresses. While it contains significantly less infor- | |
377 mation than is available by "snooping" on Internet links, it is important | |
378 that the DCC database be treated as containing sensitive information and | |
379 to not put the most private information in the DCC database. Given the | |
380 contents of a message, one might determine whether that message has been | |
381 received by a system that subscribes to the DCC. Guesses about the | |
382 sender and addressee of a message can also be validated if the checksums | |
383 of the message have been sent to a DCC server. | |
384 | |
385 Because the DCC is distributed, organizations can operate their own DCC | |
386 servers, and configure them to share or "flood" only the checksums of | |
387 bulk mail that is not in local whitelists. | |
388 | |
389 DCC clients should not report the checksums of messages known to be pri- | |
390 vate to a DCC server. For example, checksums of messages local to a sys- | |
391 tem or that are otherwise known a priori to not be unsolicited bulk | |
392 should not be sent to a remote DCC server. This can accomplished by | |
393 adding entries for the sender to the client's local whitelist file. | |
394 Client whitelist files can also include entries for email recipients | |
395 whose mail should not be reported to a DCC server. | |
396 | |
397 SSeeccuurriittyy | |
398 Whenever considering security, one must first consider the risks. The | |
399 worst DCC security problems are unauthorized commands to a DCC service, | |
400 denial of the DCC service, and corruption of DCC data. The worst that | |
401 can be done with remote commands to a DCC server is to turn it off or | |
402 otherwise cause it to stop responding. The DCC is designed to fail | |
403 gracefully, so that a denial of service attack would at worst allow | |
404 delivery of mail that would otherwise be rejected. Corruption of DCC | |
405 data might at worst cause mail that is already somewhat "bulk" by virtue | |
406 of being received by two or more people to appear have higher recipient | |
407 numbers. Since DCC users _m_u_s_t whitelist all sources of legitimate bulk | |
408 mail, this is also not a concern. Such security risks should be | |
409 addressed, but only with defenses that don't cost more than the possible | |
410 damage from an attack. | |
411 | |
412 The DCC must contend with senders of unsolicited bulk mail who resort to | |
413 unlawful actions to express their displeasure at having their advertising | |
414 blocked. Because the DCC protocol is based on UDP, an unhappy advertiser | |
415 could try to flood a DCC server with packets supposedly from subscribers | |
416 or non-subscribers. DCC servers defend against that attack by rate-lim- | |
417 iting requests from anonymous users. | |
418 | |
419 Also because of the use of UDP, clients must be protected against forged | |
420 answers to their queries. Otherwise an unsolicited bulk mail advertiser | |
421 could send a stream of "not spam" answers to an SMTP client while simul- | |
422 taneously sending mail that would otherwise be rejected. This is not a | |
423 problem for authenticated clients of the DCC because they share a secret | |
424 with the DCC. Unauthenticated, anonymous DCC clients do not share any | |
425 secrets with the DCC, except for unique and unpredictable bits in each | |
426 query or report sent to the DCC. Therefore, DCC servers cryptographi- | |
427 cally sign answers to unauthenticated clients with bits from the corre- | |
428 sponding queries. This protects against attackers that do not have | |
429 access to the stream of packets from the DCC client. | |
430 | |
431 The passwords or shared secrets used in the DCC client and server pro- | |
432 grams are "cleartext" for several reasons. In any shared secret authen- | |
433 tication system, at least one party must know the secret or keep the | |
434 secret in cleartext. You could encrypt the secrets in a file, but | |
435 because they are used by programs, you would need a cleartext copy of the | |
436 key to decrypt the file somewhere in the system, making such a scheme | |
437 more expensive but no more secure than a file of cleartext passwords. | |
438 Asymmetric systems such as that used in UNIX allow one party to not know | |
439 the secrets, but they must be and are designed to be computationally | |
440 expensive when used in applications like the DCC that involve thousands | |
441 or more authentication checks per second. Moreover, because of "dictio- | |
442 nary attacks," asymmetric systems are now little more secure than keeping | |
443 passwords in cleartext. An adversary can compare the hash values of com- | |
444 binations of common words with /etc/passwd hash values to look for bad | |
445 passwords. Worse, by the nature of a client/server protocol like that | |
446 used in the DCC, clients must have the cleartext password. Since it is | |
447 among the more numerous and much less secure clients that adversaries | |
448 would seek files of DCC passwords, it would be a waste to complicate the | |
449 DCC server with an asymmetric system. | |
450 | |
451 The DCC protocol is vulnerable to dictionary attacks to recover pass- | |
452 words. An adversary could capture some DCC packets, and then check to | |
453 see if any of the 100,000 to 1,000,000 passwords in so called "cracker | |
454 dictionaries" applied to a packet generated the same signature. This is | |
455 a concern only if DCC passwords are poorly chosen, such as any combina- | |
456 tion of words in an English dictionary. There are ways to prevent this | |
457 vulnerability regardless of how badly passwords are chosen, but they are | |
458 computationally expensive and require additional network round trips. | |
459 Since DCC passwords are created and typed into files once and do not need | |
460 to be remembered by people, it is cheaper and quite easy to simply choose | |
461 good passwords that are not in dictionaries. | |
462 | |
463 RReelliiaabbiilliittyy | |
464 It is better to fail to filter unsolicited bulk mail than to fail to | |
465 deliver legitimate mail, so DCC clients fail in the direction of assuming | |
466 that mail is legitimate or even whitelisted. | |
467 | |
468 A DCC client sends a report or other request and waits for an answer. If | |
469 no answer arrives within a reasonable time, the client retransmits. | |
470 There are many things that might result in the client not receiving an | |
471 answer, but the most important is packet loss. If the client's request | |
472 does not reach the server, it is easy and harmless for the client to | |
473 retransmit. If the client's request reached the server but the server's | |
474 response was lost, a retransmission to the same server would be misunder- | |
475 stood as a new report of another copy of the same message unless it is | |
476 detected as a retransmission by the server. The DCC protocol includes | |
477 transactions identifiers for this purpose. If the client retransmitted | |
478 to a second server, the retransmission would be misunderstood by the sec- | |
479 ond server as a new report of the same message. | |
480 | |
481 Each request from a client includes a timestamp to aid the client in mea- | |
482 suring the round trip time to the server and to let the client pick the | |
483 closest server. Clients monitor the speed of all of the servers they | |
484 know including those they are not currently using, and use the quickest. | |
485 | |
486 CClliieenntt aanndd SSeerrvveerr--IIDDss | |
487 Servers and clients use numbers or IDs to identify themselves. ID 1 is | |
488 reserved for anonymous, unauthenticated clients. All other IDs are asso- | |
489 ciated with a pair of passwords in the _i_d_s file, the current and next or | |
490 previous and current passwords. Clients included their client IDs in | |
491 their messages. When they are not using the anonymous ID, they sign | |
492 their messages to servers with the first password associated with their | |
493 client-ID. Servers treat messages with signatures that match neither of | |
494 the passwords for the client-ID in their own _i_d_s file as if the client | |
495 had used the anonymous ID. | |
496 | |
497 Each server has a unique _s_e_r_v_e_r_-_I_D less than 32768. Servers use their | |
498 IDs to identify checksums that they _f_l_o_o_d to other servers. Each server | |
499 expects local clients sending administrative commands to use the server's | |
500 ID and sign administrative commands with the associated password. | |
501 | |
502 Server-IDs must be unique among all systems that share reports by "flood- | |
503 ing." All servers must be told of the IDs all other servers whose | |
504 reports can be received in the local _/_v_a_r_/_d_c_c_/_f_l_o_d file described in | |
505 dccd(8). However, server-IDs can be mapped during flooding between inde- | |
506 pendent DCC organizations. | |
507 | |
508 _P_a_s_s_w_d_-_I_D_s are server-IDs that should not be assigned to servers. They | |
509 appear in the often publicly readable _/_v_a_r_/_d_c_c_/_f_l_o_d and specify passwords | |
510 in the private _/_v_a_r_/_d_c_c_/_i_d_s file for the inter-server flooding protocol | |
511 | |
512 The client identified by a _c_l_i_e_n_t_-_I_D might be a single computer with a | |
513 single IP address, a single but multi-homed computer, or many computers. | |
514 Client-IDs are not used to identify checksum reports, but the organiza- | |
515 tion operating the client. A client-ID need only be unique among clients | |
516 using a single server. A single client can use different client-IDs for | |
517 different servers, each client-ID authenticated with a separate password. | |
518 | |
519 An obscure but important part of all of this is that the inter-server | |
520 flooding algorithm depends on server-IDs and timestamps attached to | |
521 reports of checksums. The inter-server flooding mechanism requires coop- | |
522 erating DCC servers to maintain reasonable clocks ticking in UTC. | |
523 Clients include timestamps in their requests, but as long as their time- | |
524 stamps are unlikely to be repeated, they need not be very accurate. | |
525 | |
526 IInnssttaallllaattiioonn CCoonnssiiddeerraattiioonnss | |
527 DCC clients on a computer share information about which servers are cur- | |
528 rently working and their speeds in a shared memory segment. This segment | |
529 also contains server host names, IP addresses, and the passwords needed | |
530 to authenticate known clients to servers. That generally requires that | |
531 dccm(8), dccproc(8), dccifd(8), and cdcc(8) execute with an UID that can | |
532 write to the DCC home directory and its files. The sendmail interface, | |
533 dccm, is a daemon that can be started by an "rc" or other script already | |
534 running with the correct UID. The other two, dccproc and cdcc need to be | |
535 set-UID because they are used by end users. They relinquish set-UID | |
536 privileges when not needed. | |
537 | |
538 Files that contain cleartext passwords including the shared file used by | |
539 clients must be readable only by "owner." | |
540 | |
541 The data files required by a DCC can be in a single "home" directory, | |
542 _/_v_a_r_/_d_c_c. Distinct DCC servers can run on a single computer, provided | |
543 they use distinct UDP port numbers and home directories. It is possible | |
544 and convenient for the DCC clients using a server on the same computer to | |
545 use the same home directory as the server. | |
546 | |
547 The DCC source distribution includes sample control files. They should | |
548 be modified appropriately and then copied to the DCC home directory. | |
549 Files that contain cleartext passwords must not be publicly readable. | |
550 | |
551 The DCC source includes "feature" m4 files to configure sendmail to use | |
552 dccm(8) to check a DCC server about incoming mail. | |
553 | |
554 See also the INSTALL.html file. | |
555 | |
556 CClliieenntt IInnssttaallllaattiioonn | |
557 Installing a DCC client starts with obtaining or compiling program bina- | |
558 ries for the client server data control tool, cdcc(8). Installing the | |
559 sendmail DCC interface, dccm(8), or dccproc(8), the general or | |
560 procmail(1) interface is the main part of the client installation. Con- | |
561 necting the DCC to sendmail with dccm is most powerful, but requires | |
562 administrative control of the system running sendmail. | |
563 | |
564 As noted above, cdcc and dccproc should be set-UID to a suitable UID. | |
565 Root or 0 is thought to be safe for both, because they are careful to | |
566 release privileges except when they need them to read or write files in | |
567 the DCC home directory. A DCC home directory, _/_v_a_r_/_d_c_c should be cre- | |
568 ated. It must be owned and writable by the UID to which cdcc is set. | |
569 | |
570 After the DCC client programs have been obtained, contact the operator(s) | |
571 of the chosen DCC server(s) to obtain each server's hostname, port num- | |
572 ber, and a _c_l_i_e_n_t_-_I_D and corresponding password. No client-IDs or pass- | |
573 words are needed touse DCC servers that allow anonymous clients. Use the | |
574 _l_o_a_d or _a_d_d commands of cdcc to create a _m_a_p file in the DCC home direc- | |
575 tory. It is usually necessary to create a client whitelist file of the | |
576 format described above. To accommodate users sharing a computer but not | |
577 ideas about what is solicited bulk mail, the client whitelist file can be | |
578 any valid path name and need not be in the DCC home directory. | |
579 | |
580 If dccm is chosen, arrange to start it with suitable arguments before | |
581 sendmail is started. See the _h_o_m_e_d_i_r_/_d_c_c___c_o_n_f file and the _m_i_s_c_/_r_c_D_C_C | |
582 script in the DCC source. The procmail DCCM interface, dccproc(8), can | |
583 be run manually or by a procmailrc(5) rule. | |
584 | |
585 SSeerrvveerr IInnssttaallllaattiioonn | |
586 The DCC server, dccd(8), also requires that the DCC home directory exist. | |
587 It does not use the client shared or memory mapped file of server | |
588 addresses, but it requires other files. One is the _/_v_a_r_/_d_c_c_/_i_d_s file of | |
589 client-IDs, server-IDs, and corresponding passwords. Another is a _f_l_o_d | |
590 file of peers that send and receive floods of reports of checksums with | |
591 large counts. Both files are described in dccd(8). | |
592 | |
593 The server daemon should be started when the system is rebooted, probably | |
594 before sendmail. See the _m_i_s_c_/_r_c_D_C_C and _m_i_s_c_/_s_t_a_r_t_-_d_c_c_d files in the DCC | |
595 source. | |
596 | |
597 The database should be cleaned regularly with dbclean(8) such as by run- | |
598 ning the crontab job that is in the misc directory. | |
599 | |
600 SSEEEE AALLSSOO | |
601 cdcc(8), dbclean(8), dcc(8), dccd(8), dccifd(8), dccm(8), dccproc(8), | |
602 dblist(8), dccsight(8), sendmail(8). | |
603 | |
604 HHIISSTTOORRYY | |
605 Distributed Checksum Clearinghouses are based on an idea of Paul Vixie | |
606 with code designed and written at Rhyolite Software starting in 2000. | |
607 This document describes version 1.3.103. | |
608 | |
609 February 26, 2009 |