-->

Monitoring Programs On Linux

One of the toughest jobs of being a Linux system administrator is keeping track of what’s running on the system — especially now, when graphical desktops take a handful of programs just to produce a single desktop. You always have lots of programs running on the system.

Monitoring Programs On Linux
Monitoring Programs On Linux


Fortunately, a few command line tools are available to help make life easier for you. This section covers a few of the basic tools you need to know how to use to manage programs on your Linux system.

Peeking at the processes

When a program runs on the system, it’s referred to as a process. To examine these processes, you need to become familiar with the ps command, the Swiss Army knife of utilities. It can produce lots of information about all the programs running on your system.

Unfortunately, with this robustness comes complexity — in the form of numerous parameters — making the ps command probably one of the most difficult commands to master. Most system administrators find a subset of these parameters that provide the information they want, and they stick with using only those.

That said, however, the basic ps command doesn’t really provide all that much
information:

  1. $ ps
  2. PID TTY TIME CMD
  3. 3081 pts/0 00:00:00 bash
  4. 3209 pts/0 00:00:00 ps
  5. $

Not too exciting. By default, the ps command shows only the processes that belong to the current user and that are running on the current terminal. In this case, we had only our bash shell running (remember, the shell is just another program running on the system) and, of course, the ps command itself.

The basic output shows the process ID (PID) of the programs, the terminal (TTY) that they are running from, and the CPU time the process has used.

Note
The tricky feature of the ps command (and the part that makes it so complicated) is that at one time there were two versions of it. Each version had its own set of command line parameters controlling what information it displayed and how.

Recently, Linux developers have combined the two ps command formats into a single ps program (and of course added their own touches).

The GNU ps command that’s used in Linux systems supports three different types of command line parameters:


  • Unix-style parameters, which are preceded by a dash
  • BSD-style parameters, which are not preceded by a dash
  • GNU long parameters, which are preceded by a double dash

The following sections examine the three different parameter types and show examples of how they work.

Unix-style parameters

The Unix-style parameters originated with the original ps command that ran on the AT&T Unix systems invented by Bell Labs. The Table below shows these parameters.

The ps Command Unix Parameters
The ps Command Unix Parameters



system administrators have their own sets of parameters that they use for extracting pertinent information. For example, if you need to see everything running on the system, use the -ef parameter combination (the ps command lets you combine parameters like this):

  1. $ ps -ef
  2. UID PID PPID C STIME TTY TIME CMD
  3. root 1 0 0 11:29 ? 00:00:01 init [5]
  4. root 2 0 0 11:29 ? 00:00:00 [kthreadd]
  5. root 3 2 0 11:29 ? 00:00:00 [migration/0]
  6. root 4 2 0 11:29 ? 00:00:00 [ksoftirqd/0]
  7. root 5 2 0 11:29 ? 00:00:00 [watchdog/0]
  8. root 6 2 0 11:29 ? 00:00:00 [events/0]
  9. root 7 2 0 11:29 ? 00:00:00 [khelper]
  10. root 47 2 0 11:29 ? 00:00:00 [kblockd/0]
  11. root 48 2 0 11:29 ? 00:00:00 [kacpid]
  12. 68 2349 1 0 11:30 ? 00:00:00 hald
  13. root 3078 1981 0 12:00 ? 00:00:00 sshd: rich [priv]
  14. rich 3080 3078 0 12:00 ? 00:00:00 sshd: rich@pts/0
  15. rich 3081 3080 0 12:00 pts/0 00:00:00 -bash
  16. rich 4445 3081 3 13:48 pts/0 00:00:00 ps -ef
  17. $

Quite a few lines have been cut from the output to save space, but you can see that lots of processes are running on a Linux system. This example uses two parameters: the -e parameter, which shows all the processes running on the system, and the -f parameter, which expands the output to show a few useful columns of information:


  • UID: The user responsible for launching the process 
  • PID: The process ID of the process
  • PPID: The PID of the parent process (if a process is started by another process)
  • C: Processor utilization over the lifetime of the process
  • STIME: The system time when the process started
  • TTY: The terminal device from which the process was launched
  • TIME: The cumulative CPU time required to run the process
  • CMD: The name of the program that was started


This produces a reasonable amount of information, which is what many system administrators want to see. For even more information, you can use the -l parameter, which produces the long format output:

  1. $ ps -l
  2. F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
  3. 0 S 500 3081 3080 0 80 0 - 1173 wait pts/0 00:00:00 bash
  4. 0 R 500 4463 3081 1 80 0 - 1116 - pts/0 00:00:00 ps
  5. $

Notice the extra columns that appear when you use the -l parameter:


  • F: System flags assigned to the process by the kernel
  • S: The state of the process (O = running on processor; S = sleeping; R = runnable, waiting to run; Z = zombie, process terminated but parent not available; T = process stopped)
  • PRI: The priority of the process (higher numbers mean lower priority)
  • NI: The nice value, which is used for determining priorities
  • ADDR: The memory address of the process
  • SZ: Approximate amount of swap space required if the process was swapped out
  • WCHAN: Address of the kernel function where the process is sleeping
  • BSD-style parameters

Now that you’ve seen the Unix parameters, let’s look at the BSD-style parameters. The Berkeley Software Distribution (BSD) was a version of Unix developed at (of course) the University of California, Berkeley. It had many subtle differences from the AT&T Unix system, thus sparking many Unix wars over the years. The Table below shows the BSD version of the ps command parameters.

The ps Command BSD Parameters
The ps Command BSD Parameters


As you can see, the Unix and BSD types of parameters have lots of overlap. Most of the information you can get from one you can also get from the other. Most of the time, you choose a parameter type based on which format you’re more comfortable with (for example, if you were used to a BSD environment before using Linux).

When you use the BSD-style parameters, the ps command automatically changes the output to simulate the BSD format. Here’s an example using the l parameter:

  1. $ ps l
  2. F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
  3. 0 500 3081 3080 20 0 4692 1432 wait Ss pts/0 0:00 -bash
  4. 0 500 5104 3081 20 0 4468 844 - R+ pts/0 0:00 ps l
  5. $

Notice that while many of the output columns are the same as when we used the Unixstyle
parameters, some different ones appear as well:


  • VSZ: The size in kilobytes of the process in memory
  • RSS: The physical memory that a process has used that isn’t swapped out
  • STAT: A two-character state code representing the current process state

Many system administrators like the BSD-style l parameter because it produces a more detailed state code for processes (the STAT column). The two-character code more precisely defines exactly what’s happening with the process than the single-character Unix-style output.

The first character uses the same values as the Unix-style S output column, showing when a process is sleeping, running, or waiting. The second character further defines the process’s status:


  • <: The process is running at high priority.
  • N: The process is running at low priority.
  • L: The process has pages locked in memory.
  • s: The process is a session leader.
  • l: The process is multi-threaded.
  • +: The process is running in the foreground.


From the simple example shown previously, you can see that the bash command is sleeping, but it is a session leader (it’s the main process in my session), whereas the ps command was running in the foreground on the system.

The GNU long parameters

Finally, the GNU developers put their own touches on the new, improved ps command by adding a few more options to the parameter mix. Some of the GNU long parameters copy existing Unix- or BSD-style parameters, while others provide new features. The Table below lists the available GNU long parameters.

The ps Command GNU Parameters
The ps Command GNU Parameters


You can combine GNU long parameters with either Unix- or BSD-style parameters to really customize your display. One cool feature of GNU long parameters that we really like is the —forest parameter. It displays the hierarchical process information, but using ASCII characters to draw cute charts:

  1. 1981 ? 00:00:00 sshd
  2. 3078 ? 00:00:00 \_ sshd
  3. 3080 ? 00:00:00 \_ sshd
  4. 3081 pts/0 00:00:00 \_ bash
  5. 16676 pts/0 00:00:00 \_ ps

This format makes tracing child and parent processes a snap! Real-time process monitoring The ps command is great for gleaning information about processes running on the system, but it has one drawback. The ps command can display information only for a specific point in time. If you’re trying to find trends about processes that are frequently swapped in and out of memory, it’s hard to do that with the ps command.

Instead, the top command can solve this problem. The top command displays process information similarly to the ps command, but it does it in real-time mode. The Picture is a snapshot of the top command in action.

pic The output of the top command while it is running

The first section of the output shows general system information. The first line shows the current time, how long the system has been up, the number of users logged in, and the load average on the system.

The load average appears as three numbers: the 1-minute, 5-minute, and 15-minute load averages. The higher the values, the more load the system is experiencing. It’s not uncommon for the 1-minute load value to be high for short bursts of activity. If the 15-minute load value is high, your system may be in trouble.

Note
The trick in Linux system administration is defining what exactly a high load average value is. This value depends on what’s normally running on your system and the hardware configuration. What’s high for one system might be normal for another. Usually, if your load averages start getting over 2, things are getting busy on your system.

The second line shows general process information (called tasks in top): how many processes are running, sleeping, stopped, and zombie (have finished but their parent process hasn’t responded).

The next line shows general CPU information. The top display breaks down the CPU utilization into several categories depending on the owner of the process (user versus system processes) and the state of the processes (running, idle, or waiting).

Following that are two lines that detail the status of the system memory. The first line shows the status of the physical memory in the system, how much total memory there is, how much is currently being used, and how much is free. The second memory line shows the status of the swap memory area in the system (if any is installed), with the same information.

Finally, the next section shows a detailed list of the currently running processes, with some information columns that should look familiar from the ps command output:


  • PID: The process ID of the process
  • USER: The user name of the owner of the process
  • PR: The priority of the process
  • NI: The nice value of the process
  • VIRT: The total amount of virtual memory used by the process
  • RES: The amount of physical memory the process is using
  • SHR: The amount of memory the process is sharing with other processes
  • S: The process status (D = interruptible sleep, R = running, S = sleeping, T = traced or stopped, or Z = zombie)
  • %CPU: The share of CPU time that the process is using
  • %MEM: The share of available physical memory the process is using
  • TIME+: The total CPU time the process has used since starting
  • COMMAND: The command line name of the process (program started)

By default, when you start top, it sorts the processes based on the %CPU value. You can change the sort order by using one of several interactive commands while top is running.

Each interactive command is a single character that you can press while top is running and changes the behavior of the program. Pressing f allows you to select the field to use to sort the output, and pressing d allows you to change the polling interval. Press q to exit the top display.

You have lots of control over the output of the top command. Using this tool, you can often find offending processes that have taken over your system. Of course, after you find one, the next job is to stop it, which brings us to the next topic.

Stopping processes

A crucial part of being a system administrator is knowing when and how to stop a process. Sometimes, a process gets hung up and needs a gentle nudge to either get going again or stop. Other times, a process runs away with the CPU and refuses to give it up. In both cases, you need a command that allows you to control a process. Linux follows the Unix method of interprocess communication.

In Linux, processes communicate with each other using signals. A process signal is a predefined message that processes recognize and may choose to ignore or act on. The developers program how a process handles signals. Most well-written applications have the ability to receive and act on the standard Unix process signals. The Table elow shows these signals.

Linux Process Signals
Linux Process Signals


Two commands available in Linux allow you to send process signals to running processes.

The kill command

The kill command allows you to send signals to processes based on their process ID (PID). By default, the kill command sends a TERM signal to all the PIDs listed on the command line. Unfortunately, you can only use the process PID instead of its command name, making the kill command difficult to use sometimes.

To send a process signal, you must either be the owner of the process or be logged in as the root user.

  1. $ kill 3940
  2. -bash: kill: (3940) - Operation not permitted
  3. $

The TERM signal tells the process to kindly stop running. Unfortunately, if you have a
runaway process, most likely it ignores the request. When you need to get forceful, the -s
parameter allows you to specify other signals (either using their name or signal number).
As you can see from the following example, no output is associated with the kill
command.

  1. # kill -s HUP 3940
  2. #

To see if the command was effective, you must perform another ps or top command to see if the offending process stopped.

The killall command

The killall command is a powerful way to stop processes by using their names rather than the PID numbers. The killall command allows you to use wildcard characters as well, making it a very useful tool when you have a system that’s gone awry:

  1. # killall http*
  2. #

This example kills all the processes that start with http, such as the httpd services for the Apache web server.

Caution
Be extremely careful using the killall command when logged in as the root user. It’s
easy to get carried away with wildcard characters and accidentally stop important
system processes. This could lead to a damaged filesystem.

0 Response to "Monitoring Programs On Linux"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel