jenkins Process leaked file descriptors

jenkins Process leaked file descriptors

转自:http://blog.csdn.net/freeiceflame/article/details/43018567

在Jenkins中新建了一个Job,假设你在一些列Build Step之前/之后,启动了一个进程,打个比方说启动一个Jboss进程。等到Build完成,你去Console Output中查看显示启动成功,甚至PID也有了。但是当你去后台查看的时候,发现其实这个进程根本不存在,并没有启动成功。

不过如果你使用的是较早的Hudson版本(Ver 1.136),并且是直接在页面中的Build Session中(如Excute Shell)执行命令的话,你可能会得到如下的提示:

Process leaked file descriptors. See http://wiki.hudson-ci.org/display/HUDSON/Spawning+processes+from+build for more information

如果你看过这篇文章,那么你就会明白造成这种问题的原因是什么:Jenkins默认会在Build结束后Kill掉所有的衍生进程,用官方的话来说就是:

To reliably kill processes spawned by a job during a build, Jenkins contains a bit of native code to list up such processes and kill them.

 问题分析:

The reason this problem happens is because of file descriptor leak and how they are inherited from one process to another. Jenkins and the child process are connected by three pipes (stdin/stdout/stderr.) This allows Jenkins to capture the output from the child process. Since the child process may write a lot of data to the pipe and quit immediately after that, Jenkins needs to make sure that it drained the pipes before it considers the build to be over. Jenkins does this by waiting for EOF.

When a process terminates for whatever reasons, the operating system closes all the file descriptors it owned. So even if the process didn't close stdout/stderr, Jenkins will nevertheless get EOF.

The complication happens when those file descriptors are inherited to other processes. Let's say the child process forks another process to the background. The background process (AKA daemon) inherits all the file descriptors of the parent, including the writing side of the stdout/stderr pipes that connect the child process and Jenkins. If the daemon forgets to close them, Jenkins won't get EOF for pipes even when the child process exits, because daemon still have those descriptors open. That's how this problem happens.

从官方的说明中可以看出,造成这种问题的原因,是由于文件描述符的丢失以及子进程是如何继承的。

Jenkins和子进程之间的通信使用三根管道进行:stdin,stdout,stderr,由于Jenkins可以捕捉到子进程的输出,而子进程可能往stdout中写入大量数据然后立即退出,Jenkins为了完整的读取子进程的输出,会在用于子进程stdout的管道上等待EOF。这样,无论子进程由于什么原因退出,系统将关闭进程使用的文件描述符,因而Jenkins总是可以收到EOF。

但是当子进程在后台fork出另一个进程的时候(比如A fork出了B,启动一个daemon进程),情况就出现一些变化。由于进程B会继承进程A的文件描述 符,如果进程B没有关闭这些描述符,即使进程A退出,这些描述符依然是打开的,Jenkins将不会在相应管道上收到EOF。

通常一个实现良好的daemon会关闭所有文件描述符以避免这个问题,但是总有一些实现没有遵循这个规则。在更旧版本的Hudson上,这个问题甚至将导致Job无法结束,Jenkins一直在那等待EOF。


 解决办法:

现在我们知道,Jboss之所以没有启动成功,是因为它没有关闭继承的文件描述符,Jenkins在Job构建过程结束后认为Jboss进程未终止,因而将其kill掉了。

1. 重设环境变量BUILD_ID

在研究这个问题的时候,找到了另外一篇文章:https://wiki.jenkins-ci.org/display/JENKINS/ProcessTreeKiller,这篇文章进一步描述了Hudson杀掉衍生进程的情况:

The ProcessTreeKiller takes advantage of the fact that by default a new process gets a copy of the environment variables of its spawning/creating process.

It sets a specific environment variable in the process executing the build job. Later, when the user requests to stop the build job's process it gets a list of all processes running on the computer and their environment variables, and looks for the environment variable that it initially set for the build job's process.

Every job with that environment variable in its environment is then terminated.

意思就是说,Jenkins会在执行Job时设置一系列的环境变量,这些环境变量将被Job衍生出的进程所继承。Jenkins在Kill掉衍生进程的时候会查看进程的环境变量,如果找到它之前设置的环境变量,就将其杀掉。

因此,Wiki中也给出了另外一个简单的方法来避免进程被杀掉:

A convenient way to achieve that is to change the environment variable BUILD_ID which Jenkins's ProcessTreeKiller is looking for. This will cause Jenkins to assume that your daemon is not spawned by the Jenkins build.

修改Hudson设置的环境变量BUILD_ID的值,从而让Jenkins认为此进程不是由Job的构建过程衍生的,如:

  1. BUILD_ID=dontKillMe /usr/apache/bin/httpd  

后面的"/usr/apache/bin/httpd"可以省略,即只需要在parameter build trigger中加入一个string parameter,变量名为BUILD_ID,值为dontKillMe即可。

相关文章
相关标签/搜索