-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathatom.xml
More file actions
436 lines (334 loc) · 35.3 KB
/
atom.xml
File metadata and controls
436 lines (334 loc) · 35.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title><![CDATA[What Now]]></title>
<link href="http://danielzhou82.github.io/atom.xml" rel="self"/>
<link href="http://danielzhou82.github.io/"/>
<updated>2014-10-08T16:35:25+08:00</updated>
<id>http://danielzhou82.github.io/</id>
<author>
<name><![CDATA[Daniel Zhou]]></name>
</author>
<generator uri="http://octopress.org/">Octopress</generator>
<entry>
<title type="html"><![CDATA[Containerize Syslog-ng]]></title>
<link href="http://danielzhou82.github.io/blog/2014/09/30/containerize-syslog-ng/"/>
<updated>2014-09-30T14:22:55+08:00</updated>
<id>http://danielzhou82.github.io/blog/2014/09/30/containerize-syslog-ng</id>
<content type="html"><![CDATA[<p>I have a pretty old LXC configuration, with <code>kernel 2.6.32</code> and <code>LXC 0.7</code>. The problem is I wasn't able to make all syslog-ng instances in both host and containers work. The syslog-ng daemon is running, however isn't writing anything to log files.</p>
<p>The culprit is the <code>/dev/log</code> UNIX socket.</p>
<p>The syslog in UNIX system is running in client/server mode:</p>
<ul>
<li>client is the <code>syslog()</code> interface in glibc. <code>syslog()</code> connects to socket <code>/dev/log</code> and send messages via the socket.</li>
<li>and server is the listener of socket <code>/dev/log</code>, such as syslog-ng, rsyslog.</li>
</ul>
<!--more-->
<p>This is how syslog-ng use <code>/dev/log</code>:</p>
<ul>
<li>check if <code>/dev/log</code> is already there, if so unlink it;</li>
<li>create <code>/dev/log</code> with socket/bind.</li>
</ul>
<p>And this is how I mount <code>/dev</code> in container config:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>lxc.mount.entry=/dev /var/lxc/ct1/rootfs/dev none defaults,bind 0 0</span></code></pre></td></tr></table></div></figure>
<p>Because <code>/dev</code> is bind-mounted, <code>/dev/log</code> is shared among all containers, each time when syslog-ng instance starts, the old <code>/dev/log</code> is unlinked. The removal apparently will break the current syslog-ng instance listening on it. So at last only one syslog-ng instance (the last one) can work without problem.</p>
<p>Finally I made it work.</p>
<p>First I changed syslog-ng configuration files for both host and containers:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
</pre></td><td class='code'><pre><code class=''><span class='line'># cat /etc/syslog-ng/syslog-ng.conf
</span><span class='line'>...
</span><span class='line'>source s_sys {
</span><span class='line'> file ("/proc/kmsg" program_override("kernel: "));
</span><span class='line'> unix-stream ("/var/run/syslog-ng.sock");
</span><span class='line'> internal();
</span><span class='line'>};
</span><span class='line'>...</span></code></pre></td></tr></table></div></figure>
<p>It means syslog-ng will listen to an alternative socket at <code>/var/run/syslog-ng.sock</code> instead of <code>/dev/log</code>. Since <code>/var</code> is a container private directoy, each syslog-ng can listen without interfering with each other.</p>
<p>At the client end, glibc is still using <code>/dev/log</code> to write log message, so I made <code>/dev/log</code> a symbol link:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>ln -sf /var/run/syslog-ng.sock /dev/log</span></code></pre></td></tr></table></div></figure>
<p>Now when application calls <code>syslog()</code>, it connects and send messages to <code>/var/run/syslog-ng.sock</code>. Syslog-ng who is listening on the socket will handle the message correctly.</p>
<p>To make the change permanent, I changed the start routine in <code>/etc/init.d/syslog-ng</code>:</p>
<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>start<span class="o">()</span> <span class="o">{</span>
</span><span class='line'> <span class="nb">echo</span> -n <span class="s2">"Starting syslog-ng: "</span>
</span><span class='line'> <span class="k">if</span> <span class="o">[</span> ! -h /dev/log <span class="o">]</span><span class="p">;</span> <span class="k">then</span>
</span><span class='line'> rm -f /dev/log 2>/dev/null
</span><span class='line'> ln -sf /var/run/syslog-ng.sock /dev/log
</span><span class='line'> <span class="k">fi</span>
</span><span class='line'> <span class="nb">ulimit</span> -u 600000
</span><span class='line'> daemon <span class="nv">$binary</span>
</span><span class='line'> <span class="nv">RETVAL</span><span class="o">=</span><span class="nv">$?</span>
</span><span class='line'> <span class="nb">echo</span>
</span><span class='line'> <span class="o">[</span> <span class="nv">$RETVAL</span> -eq <span class="m">0</span> <span class="o">]</span> <span class="o">&&</span> touch /var/lock/subsys/syslog-ng
</span><span class='line'><span class="o">}</span>
</span></code></pre></td></tr></table></div></figure>
<p>It makes sure <code>/dev/log</code> is a correct symbol link when syslog-ng starts.</p>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[On the Confusing Ulimit]]></title>
<link href="http://danielzhou82.github.io/blog/2014/09/30/on-the-confusing-ulimit/"/>
<updated>2014-09-30T13:01:14+08:00</updated>
<id>http://danielzhou82.github.io/blog/2014/09/30/on-the-confusing-ulimit</id>
<content type="html"><![CDATA[<h2>1. ulimit and rlimit</h2>
<p>Literally, ulimit is per user resource limit, such like the total process number allowed to spawn, and the total number of file allowed to open etc. In Linux, user delegates process to apply system resource. To understand ulimit, one must get clear idea of rlimit of process.</p>
<p>The definition of rlimit:</p>
<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
</pre></td><td class='code'><pre><code class='cpp'><span class='line'><span class="k">struct</span> <span class="n">rlimit</span> <span class="p">{</span>
</span><span class='line'> <span class="kt">rlim_t</span> <span class="n">rlim_cur</span><span class="p">;</span> <span class="cm">/* Soft limit */</span>
</span><span class='line'> <span class="kt">rlim_t</span> <span class="n">rlim_max</span><span class="p">;</span> <span class="cm">/* Hard limit (ceiling for rlim_cur) */</span>
</span><span class='line'><span class="p">};</span>
</span></code></pre></td></tr></table></div></figure>
<p>rlimit structure is used to describe resource limit of process (not accurate actually, will explain later). <code>rlim_cur</code> is current active limit and <code>rlim_max</code> is the ceiling value of <code>rlim_cur</code>.</p>
<!--more-->
<p>There are various types of resource in Linux OS, hence an array <code>rlim[RLIM_NLIMITS]</code> is included (indirectly) in <code>task_struct</code> (the process descriptor). Each different type of resource limit is referenced by the array index. For example, <code>rlim[RLIMIT_NPROC]</code> is the maximum number of process and <code>rlim[RLIMIT_NOFILE]</code> is the maximum number of files.</p>
<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
</pre></td><td class='code'><pre><code class='cpp'><span class='line'><span class="k">struct</span> <span class="n">task_struct</span> <span class="o">-></span>
</span><span class='line'> <span class="k">struct</span> <span class="n">signal_struct</span> <span class="o">-></span>
</span><span class='line'> <span class="k">struct</span> <span class="n">rlimit</span> <span class="n">rlim</span><span class="p">[</span><span class="n">RLIM_NLIMITS</span><span class="p">]</span>
</span></code></pre></td></tr></table></div></figure>
<p>Linux kernel provides <code>setrlimit()/getrlimit()</code> syscalls for process to write/read its rlimit of specific resource.</p>
<p>Now we know the rlimit is attached to process, then where are the resource counters? In Linux, they are in <code>user_struct</code> (the user descriptor).</p>
<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class='cpp'><span class='line'><span class="k">struct</span> <span class="n">task_struct</span> <span class="o">-></span>
</span><span class='line'> <span class="k">struct</span> <span class="n">cred</span> <span class="o">*</span><span class="n">real_cred</span> <span class="o">-></span>
</span><span class='line'> <span class="k">struct</span> <span class="n">user_struct</span> <span class="o">*</span><span class="n">user</span> <span class="o">-></span>
</span><span class='line'> <span class="n">process</span> <span class="nl">counter</span><span class="p">:</span> <span class="kt">atomic_t</span> <span class="n">processes</span>
</span><span class='line'> <span class="n">file</span> <span class="nl">counter</span><span class="p">:</span> <span class="kt">atomic_t</span> <span class="n">files</span>
</span></code></pre></td></tr></table></div></figure>
<p>So how the limitation works? Take <code>RLIMIT_NPROC</code> as an example, in function <code>fork()</code> in kernel, we have below snippet:</p>
<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
</pre></td><td class='code'><pre><code class='cpp'><span class='line'> <span class="n">retval</span> <span class="o">=</span> <span class="o">-</span><span class="n">EAGAIN</span><span class="p">;</span>
</span><span class='line'> <span class="k">if</span> <span class="p">(</span><span class="n">atomic_read</span><span class="p">(</span><span class="o">&</span><span class="n">p</span><span class="o">-></span><span class="n">real_cred</span><span class="o">-></span><span class="n">user</span><span class="o">-></span><span class="n">processes</span><span class="p">)</span> <span class="o">>=</span>
</span><span class='line'> <span class="n">p</span><span class="o">-></span><span class="n">signal</span><span class="o">-></span><span class="n">rlim</span><span class="p">[</span><span class="n">RLIMIT_NPROC</span><span class="p">].</span><span class="n">rlim_cur</span><span class="p">)</span> <span class="p">{</span>
</span><span class='line'> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">capable</span><span class="p">(</span><span class="n">CAP_SYS_ADMIN</span><span class="p">)</span> <span class="o">&&</span> <span class="o">!</span><span class="n">capable</span><span class="p">(</span><span class="n">CAP_SYS_RESOURCE</span><span class="p">)</span> <span class="o">&&</span>
</span><span class='line'> <span class="n">p</span><span class="o">-></span><span class="n">real_cred</span><span class="o">-></span><span class="n">user</span> <span class="o">!=</span> <span class="n">INIT_USER</span><span class="p">)</span>
</span><span class='line'> <span class="k">goto</span> <span class="n">bad_fork_free</span><span class="p">;</span>
</span><span class='line'> <span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>
<p>The kernel compares the counter in <code>user_struct</code> with the rlimit in <code>task_struct</code>, then reject the fork request by returning <code>EAGAIN</code> if rlimit is exceeded, otherwise gives green light and bump up the counter. As special cases, the check is bypassed if the process has capabilities of <code>CAP_SYS_ADMIN</code> or <code>CAP_SYS_RESOURCE</code> or the owner of the process is root.</p>
<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class='cpp'><span class='line'><span class="n">do_fork</span><span class="p">()</span>
</span><span class='line'> <span class="n">copy_process</span><span class="p">()</span>
</span><span class='line'> <span class="n">copy_creds</span><span class="p">()</span> <span class="c1">// inherit parent's credential</span>
</span><span class='line'> <span class="n">check</span> <span class="n">processes</span> <span class="n">and</span> <span class="n">rlimit</span> <span class="c1">// see above</span>
</span><span class='line'> <span class="n">processes</span> <span class="o">++</span>
</span></code></pre></td></tr></table></div></figure>
<p>We can see from above call stack that child process will inherit the credential (<code>user_struct</code> as well) of parent when doing <code>fork()</code>, which indirectly inherits all rlimit settings as well.</p>
<p>The rlimit is per process, while the resource counters are per user. I guess this is the root of all confusion about ulimit and rlimit. Say we have user U and his two processes P1 and P2, the <code>RLIMIT_NPROC</code> are 100 for P1 and 200 for P2. Then we have:</p>
<ul>
<li>when counter C < 100, both P1 and P2 are allowed to spawn new processes;</li>
<li>when 100 <= C < 200, only P2 is allowed to spawn new processes;</li>
<li>when C > 200, both P1 and P2 are not allowed to spawn new processes.</li>
</ul>
<p>From processes' perspective, rlimit is not the quantity of resource a process can own. To check how much resource one processes can use, we must know how much other processes from the same user have alreay taken. This is one important reason why some application like Apache and Mysql run under their own user.</p>
<h2>2. how rlimit inherited?</h2>
<p>It's also confusing when it comes to the rlimit inheritance problem. Given a process, where are all the rlimit values from? And how are we supposed to change the values?</p>
<ul>
<li>init process: kernel decides most of rlimit values;</li>
<li>System Daemons: inherits from init, however may change rlimit by calling <code>setrlimit()</code></li>
<li>Login Shell: use the values defined in /etc/security/limits.conf if exist, inherits from init for the rest;</li>
<li>Process spawned in Shell: inherits from Shell</li>
</ul>
<h3>2.1 process 1: init</h3>
<p>Process 1 (init) is very special. Kernel finishes most of environment setups, and loads <strong>/sbin/init</strong> from user land. Running as a normal Linux process, /sbin/init is allowed to call setrlimit() to change the rlimit value from kernel. For example, SysVinit code sets RLIMIT_CORE to 0 at early stage.</p>
<p>We can read rlimit values of process init from procfs:</p>
<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class='cpp'><span class='line'><span class="cp"># cat /proc/1/limits</span>
</span><span class='line'><span class="p">...</span>
</span><span class='line'><span class="n">Max</span> <span class="n">processes</span> <span class="mi">193115</span> <span class="mi">193115</span> <span class="n">processes</span>
</span><span class='line'><span class="n">Max</span> <span class="n">open</span> <span class="n">files</span> <span class="mi">1024</span> <span class="mi">4096</span> <span class="n">files</span>
</span><span class='line'><span class="p">...</span>
</span></code></pre></td></tr></table></div></figure>
<p>So why the <code>RLIMIT_NOFILE</code> of init is 1024? Actually it's hardcoded by kernel.</p>
<p>For your reference, see <code>INIT_RLIMITS</code> macro in kernel source, which initializes rlim array for process init:</p>
<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
</pre></td><td class='code'><pre><code class='cpp'><span class='line'><span class="cp">#define INR_OPEN_CUR 1024</span>
</span><span class='line'><span class="cp">#define INR_OPEN_MAX 4096</span>
</span><span class='line'><span class="cp">#define INIT_RLIMITS \</span>
</span><span class='line'><span class="cp">{ \</span>
</span><span class='line'><span class="cp">...</span>
</span><span class='line'> <span class="p">[</span><span class="n">RLIMIT_NPROC</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span> <span class="p">},</span> \
</span><span class='line'> <span class="p">[</span><span class="n">RLIMIT_NOFILE</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="n">INR_OPEN_CUR</span><span class="p">,</span> <span class="n">INR_OPEN_MAX</span> <span class="p">},</span> \
</span><span class='line'><span class="p">...</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>
<p>For <code>RLIMIT_NPROC</code>, kernel fuction <code>fork_init()</code> decides its value by half of global variable <code>max_threads</code>, and <code>max_threads</code> correlates to the capacity of physical memory. So we can say <code>RLIMIT_NPROC</code> of init depends on hardware configuration.</p>
<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
</pre></td><td class='code'><pre><code class='cpp'><span class='line'><span class="n">init_task</span><span class="p">.</span><span class="n">signal</span><span class="o">-></span><span class="n">rlim</span><span class="p">[</span><span class="n">RLIMIT_NPROC</span><span class="p">].</span><span class="n">rlim_cur</span> <span class="o">=</span> <span class="n">max_threads</span><span class="o">/</span><span class="mi">2</span><span class="p">;</span>
</span><span class='line'><span class="n">init_task</span><span class="p">.</span><span class="n">signal</span><span class="o">-></span><span class="n">rlim</span><span class="p">[</span><span class="n">RLIMIT_NPROC</span><span class="p">].</span><span class="n">rlim_max</span> <span class="o">=</span> <span class="n">max_threads</span><span class="o">/</span><span class="mi">2</span><span class="p">;</span>
</span><span class='line'><span class="n">init_task</span><span class="p">.</span><span class="n">signal</span><span class="o">-></span><span class="n">rlim</span><span class="p">[</span><span class="n">RLIMIT_SIGPENDING</span><span class="p">]</span> <span class="o">=</span> <span class="n">init_task</span><span class="p">.</span><span class="n">signal</span><span class="o">-></span><span class="n">rlim</span><span class="p">[</span><span class="n">RLIMIT_NPROC</span><span class="p">];</span>
</span></code></pre></td></tr></table></div></figure>
<p>You may notice that we may have different <code>RLIMIT_NOFILE</code> for RHEL5 and RHEL6. This is due to below kernel commit:</p>
<blockquote>
<p>commit 0ac1ee0bfec2a4ad118f907ce586d0dfd8db7641
Author: Tim Gardner <a href="mailto:tim.gardner@canonical.com">tim.gardner@canonical.com</a>
Date: Tue May 24 17:13:05 2011 -0700</p>
<p>ulimit: raise default hard ulimit on number of files to 4096</p>
</blockquote>
<h3>2.2 system daemons</h3>
<p>System daemons are brought up by init according to predefined configurations, in a non-interactive way. Hence they normally inherits rlimit settings from init directly. However system daemons may call setrlimit() to change rlimit values, depending on the requirement. Take syslog-ng as an example, we can see setrlimit() invocation in C source code:</p>
<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class='cpp'><span class='line'><span class="n">limit</span><span class="p">.</span><span class="n">rlimit_cur</span> <span class="o">=</span> <span class="mi">4096</span><span class="p">;</span>
</span><span class='line'><span class="n">setrlimit</span><span class="p">(</span><span class="n">RLIMIT_NOFILE</span><span class="p">,</span> <span class="o">&</span><span class="n">limit</span><span class="p">);</span>
</span></code></pre></td></tr></table></div></figure>
<p>Then the rlimit of syslog-ng looks like:</p>
<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
</pre></td><td class='code'><pre><code class='cpp'><span class='line'><span class="cp"># cat /proc/3268/limits </span>
</span><span class='line'><span class="n">Limit</span> <span class="n">Soft</span> <span class="n">Limit</span> <span class="n">Hard</span> <span class="n">Limit</span> <span class="n">Units</span>
</span><span class='line'><span class="p">...</span>
</span><span class='line'><span class="n">Max</span> <span class="n">processes</span> <span class="mi">193115</span> <span class="mi">193115</span> <span class="n">processes</span>
</span><span class='line'><span class="n">Max</span> <span class="n">open</span> <span class="n">files</span> <span class="mi">4096</span> <span class="mi">4096</span> <span class="n">files</span>
</span><span class='line'><span class="p">...</span>
</span></code></pre></td></tr></table></div></figure>
<p>The soft limit of <code>RLIMIT_NOFILE</code> is now 4096, other than init's 1024.</p>
<h3>2.3 Login Shell</h3>
<p>In normal interactive shell spawn procedure, a process will be forked and <code>/bin/login</code> will be loaded for user authentication, and then PAM plugins will set up rlimit by reading values from <code>/etc/security/limits.conf</code>. All child processes of this shell will inherit these rlimit settings.</p>
<p><code>/etc/security/limits.conf</code> is the main configuration file for system administrator to set up default rlimit, however you need to know this method doesn't apply to all scenarios.</p>
<p>In man page of limits.conf:</p>
<blockquote>
<div class="highlight"><pre><code class="language-text" data-lang="text"> Also, please note that all limit settings are set per login. They
are not global, nor are they permanent; existing only for the
duration of the session.
</code></pre></div></blockquote>
<p>Internally, <code>/etc/security/limits.conf</code> is actually the configuration file of one PAM plugin <code>pam_limits.so</code>, and has nothing to do with Linux Shell. As part of login procedure, PAM invokes system-auth submodule, which in turn involves <code>pam_limits.so</code> plugin, then it's plugin's responsibility to get rlimit value from limits.conf.</p>
<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
</pre></td><td class='code'><pre><code class='cpp'><span class='line'><span class="cp"># cat /etc/pam.d/login </span>
</span><span class='line'><span class="p">...</span>
</span><span class='line'><span class="n">session</span> <span class="n">include</span> <span class="n">system</span><span class="o">-</span><span class="n">auth</span>
</span><span class='line'><span class="p">...</span>
</span><span class='line'><span class="cp"># cat /etc/pam.d/system-auth</span>
</span><span class='line'><span class="p">...</span>
</span><span class='line'><span class="n">session</span> <span class="n">required</span> <span class="n">pam_limits</span><span class="p">.</span><span class="n">so</span>
</span><span class='line'><span class="p">...</span>
</span></code></pre></td></tr></table></div></figure>
<p>Please note PAM is user oriented, so it has no effect on system daemons, i.e., the rlimits of processes that are already running stay intact after limits.conf changed.</p>
<h2>3. how to configure rlimit</h2>
<p>In order to configure rlimit, Linux OS provides system API <code>setrlimit()</code>, Shell provides ulimit command line, and PAM provides <code>limits.conf</code> interface.</p>
<p>For non-root users, it's not allowed to change the hard limit or increase the soft limit, i.e., the only thing non-privileged user can do is lowering down the soft limit.</p>
<p>Please note no matter how big the values provided by user is, kernel will make sure it won't exceed some global upper bound. For example, <code>RLIMIT_NOFILE</code> is not allowed to exceed the value defined in <code>/proc/sys/fs/nr_open</code>, and <code>RLIMIT_NPROC</code> is not allowed to exceed the value defined in <code>/proc/sys/kernel/threads-max</code>.</p>
<p>In next sections we will talk about how to set up rlimit without calling <code>setrlimit()</code> API.</p>
<h3>3.1 Shell</h3>
<p>Let's go back to ulimit. First of all, ulimit is an internal command of Shell and runs within Shell's address space. Internally, ulimit is just a wrapper of <code>setrlimit()/getrlimit()</code> API, so we can say ulimit is nothing but rlimit configurator of Shell.</p>
<p>From the output of stace tool, ulimit -a actually calls <code>getrlimit()</code> to print out all rlimit values of Shell. Note we have to create a new shell instance to support strace because ulimit is internal command of Shell.</p>
<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
</pre></td><td class='code'><pre><code class='cpp'><span class='line'><span class="cp"># strace bash -c "ulimit -a"</span>
</span><span class='line'><span class="p">...</span>
</span><span class='line'><span class="n">getrlimit</span><span class="p">(</span><span class="n">RLIMIT_NOFILE</span><span class="p">,</span> <span class="p">{</span><span class="n">rlim_cur</span><span class="o">=</span><span class="mi">65535</span><span class="p">,</span> <span class="n">rlim_max</span><span class="o">=</span><span class="mi">65535</span><span class="p">})</span> <span class="o">=</span> <span class="mi">0</span>
</span><span class='line'><span class="n">write</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s">"open files "</span><span class="p">...,</span> <span class="mi">43</span><span class="n">open</span> <span class="n">files</span> <span class="p">(</span><span class="o">-</span><span class="n">n</span><span class="p">)</span> <span class="mi">65535</span>
</span><span class='line'><span class="p">)</span> <span class="o">=</span> <span class="mi">43</span>
</span><span class='line'><span class="p">...</span>
</span><span class='line'><span class="n">getrlimit</span><span class="p">(</span><span class="n">RLIMIT_NPROC</span><span class="p">,</span> <span class="p">{</span><span class="n">rlim_cur</span><span class="o">=</span><span class="mi">20</span><span class="o">*</span><span class="mi">1024</span><span class="p">,</span> <span class="n">rlim_max</span><span class="o">=</span><span class="mi">20</span><span class="o">*</span><span class="mi">1024</span><span class="p">})</span> <span class="o">=</span> <span class="mi">0</span>
</span><span class='line'><span class="n">write</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s">"max user processes "</span><span class="p">...,</span> <span class="mi">43</span><span class="n">max</span> <span class="n">user</span> <span class="n">processes</span> <span class="p">(</span><span class="o">-</span><span class="n">u</span><span class="p">)</span> <span class="mi">20480</span>
</span><span class='line'><span class="p">)</span> <span class="o">=</span> <span class="mi">43</span>
</span><span class='line'><span class="p">...</span>
</span></code></pre></td></tr></table></div></figure>
<p>How to check other users' ulimit settings? We have a trick here: create a shell instance under other user's name and run ulimit in the new shell.</p>
<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='cpp'><span class='line'><span class="err">$</span> <span class="n">su</span> <span class="n">admin</span> <span class="o">-</span><span class="n">c</span> <span class="s">"ulimit -a"</span>
</span></code></pre></td></tr></table></div></figure>
<p>To set rlimit values, we run <code>ulimit -S</code>. For example:</p>
<p>To enable core dump with unlimited dump size:</p>
<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='cpp'><span class='line'><span class="err">$</span> <span class="n">ulimit</span> <span class="o">-</span><span class="n">Sc</span> <span class="n">unlimited</span>
</span></code></pre></td></tr></table></div></figure>
<p>and disable core dump:</p>
<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='cpp'><span class='line'><span class="err">$</span> <span class="n">ulimit</span> <span class="o">-</span><span class="n">Sc</span> <span class="mi">0</span>
</span></code></pre></td></tr></table></div></figure>
<p>Obviously <code>ulimit -S</code> doesn't affect running processes.</p>
<h3>3.2 Login Shell</h3>
<p>For login shell, besides running ulimit explicitly, we can realize automatic setup by changing the rlimit values in <code>/etc/security/limits.conf</code>.</p>
<h3>3.3 system daemons</h3>
<p>If a system daemon doesn't change rlimit in its implementation (C code probably), we can add ulimit to the control script of the daemon (like <code>/etc/init.d/syslog-ng</code> for <code>syslog-ng</code>). Because the daemon itself is spawned from the control script, so it surely will inherit rlimit setting of the shell instance. Please also note this shell instance is not a login shell, hence PAM will stay out.</p>
<blockquote>
<p>Hint: upstart provides stanzas to configure rlimit for upstart services (not apply to legacy SysVinit daemons), see http://upstart.ubuntu.com/wiki/Stanzas#limit.</p>
</blockquote>
<h3>3.4 processes that are already running</h3>
<p>For the processes in running state, how to change rlimit setting dynamically?</p>
<p>The answer is probably the limits file beneath /proc/<pid>/. This file is read/write and root user (or anyone with sudo privilege) can write formatted string to it to change rlimit value.</p>
<p>This is how root user adjusts <code>RLIMIT_NPROC</code> of process (pid):</p>
<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='cpp'><span class='line'><span class="cp"># echo -n \"Max processes=131072:131072\" >/proc/<pid>/limits</span>
</span></code></pre></td></tr></table></div></figure>
<p>and if user is non-root with sudo privilege:</p>
<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='cpp'><span class='line'><span class="err">$</span> <span class="n">sudo</span> <span class="n">su</span> <span class="o">-</span><span class="n">c</span> <span class="s">"echo -n </span><span class="se">\"</span><span class="s">Max processes=131072:131072</span><span class="se">\"</span><span class="s"> >/proc/<pid>/limits"</span>
</span></code></pre></td></tr></table></div></figure>
<p>I hope this article could dispel some of the confusion you may have and please bear with my poor English.</p>
]]></content>
</entry>
</feed>