Qui docet: 2006-09

All the documentation about the Solaris scheduler says that the highest priority runnable thread is chosen for execution (see, for example, section 3.8.4 of Solaris Internals 2/e). At first glance that might seem to mean that the same thread will always get the CPU, if it is runnable.

That is indeed what would happen if thread priorities were static, but in fact for most threads (those in the TS, IA, an FSS classes) the priority changes based on their usage of the CPU.

On the other hand, the FX (fixed priority) scheduling class does not change the priority of a thread, so that we can use it to experiment with the scheduler's behaviour.

First of all, lets get ourselves some privileges. Note that we don't need this for plain priority 0 processes, but we do for using any other priority or quantum later.


$ ppriv $$                                    
449:       -zsh
flags = <none>
        E: basic
        I: basic
        P: basic
        L: all
$ su root -c "ppriv -s EIP+proc_priocntl $$"
Password: 
$ ppriv $$
449:       -zsh
flags = <none>
        E: basic,proc_priocntl
        I: basic,proc_priocntl
        P: basic,proc_priocntl
        L: all

Ok, and we'll need something that will used lots of CPU and not make system calls that cause it to sleep. This will make observing the behaviour clearer.


$ cat spin.c
int main()
{
 int i = 0;
 for (;;)
     i++;
 exit(0);
}
$ gcc -o spin spin.c

Now, let's look at the current processes that we're running.


$ ps -o sid -p $$                  
SID  449
$ priocntl -d -i sid 449
TIME SHARING PROCESSES: 
 PID    TSUPRILIM    TSUPRI
 449        0           0
 593        0           0

So, only TS processes with no fancy characteristics.

Lets now start our test program. The FX class provides user priorities that range from 0-60 (numerically higher is higher priority). We want out test program to be low priority.


$ priocntl -e -c FX -m 0 -p 0 ./spin &
[1] 652
$ priocntl -d -i sid 449
TIME SHARING PROCESSES:
 PID    TSUPRILIM    TSUPRI
 449        0           0
 653        0           0
FIXED PRIORITY PROCESSES:
 PID    FXUPRILIM    FXUPRI      FXTQNTM
 652        0           0         200

Good, so it's running at low priority, but on this system it has very little competition. In fact it's using close to 100% of this box's single CPU. Lets allow some time for the stats to catch up.


$ prstat -c -p 652 15 5 | sed -n -e 1p -e /spin/p
PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP    
652 boyd      996K  560K run      0    0   0:00:21  63% spin/1
652 boyd      996K  560K run      0    0   0:00:36  81% spin/1
652 boyd      996K  560K run      0    0   0:00:51  91% spin/1
652 boyd      996K  560K run      0    0   0:01:06  95% spin/1
652 boyd      996K  560K run      0    0   0:01:21  97% spin/1

Now, we start another job at the same priority.


$ priocntl -e -c FX -m 0 -p 0 ./spin &
[2] 660
$ priocntl -d -i sid 449TIME SHARING PROCESSES:
 PID    TSUPRILIM    TSUPRI
 449        0           0
 661        0           0
FIXED PRIORITY PROCESSES:

 PID    FXUPRILIM    FXUPRI      FXTQNTM
 652        0           0         200
 660        0           0         200
$ prstat -c -p 652,660 60 2 | sed -n -e 1p -e /spin/p -e 's/^Total.*//p'
PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP    
652 boyd      996K  560K run      0    0   0:01:45  71% spin/1
660 boyd      996K  560K run      0    0   0:00:08  27% spin/1

652 boyd      996K  560K run      0    0   0:02:15  51% spin/1
660 boyd      996K  560K run      0    0   0:00:37  48% spin/1

And we see that the two jobs are sharing the CPU nearly equally.

Now, lets tweak a little. First, notice that the two jobs have the same quantum, which means that they'll have the CPU for the same amount of time each time they are scheduled (assuming that no higher priority job preempts them).

Let's experiment with that quantum by halving the time for one process.


$ priocntl -s -t 100 -i pid 660
$ priocntl -d -i sid 449
TIME SHARING PROCESSES:
 PID    TSUPRILIM    TSUPRI
 449        0           0
 669        0           0
FIXED PRIORITY PROCESSES:
 PID    FXUPRILIM    FXUPRI      FXTQNTM
 652        0           0         200
 660        0           0         100
$ prstat -c -p 652,660 60 2 | sed -n -e 1p -e /spin/p -e 's/^Total.*//p'
PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP    
652 boyd      996K  560K run      0    0   0:02:36  54% spin/1
660 boyd      996K  560K run      0    0   0:00:55  44% spin/1

652 boyd      996K  560K run      0    0   0:03:15  64% spin/1
660 boyd      996K  560K run      0    0   0:01:16  35% spin/1

As we might expect, the adjusted process now has half as much CPU time as the other one.

Next, let's set the quantum back to its default value and bump the priority up by one.


$ priocntl -s -t 200 -m 1 -p 1 -i pid 660
$ priocntl -d -i sid 449
TIME SHARING PROCESSES:
 PID    TSUPRILIM    TSUPRI
 449        0           0
 677        0           0
FIXED PRIORITY PROCESSES:
 PID    FXUPRILIM    FXUPRI      FXTQNTM
 652        0           0         200
 660        1           1         200
$ prstat -c -p 652,660 120 2 | sed -n -e 1p -e /spin/p -e 's/^Total.*//p'
PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP    
660 boyd      996K  560K run      1    0   0:02:00  67% spin/1
652 boyd      996K  560K run      0    0   0:04:07  30% spin/1

660 boyd      996K  560K run      1    0   0:04:40  99% spin/1
652 boyd      996K  560K run      0    0   0:04:07 0.0% spin/1

Wow! That's really made a difference. Process 660 is getting a lot of CPU. That makes sense, since it has a higher priority and so, based on our initial premise, we'd assume it gets chosen over the lower priority process every time.

Let's see if that's really the case. First we need some extra privileges so that we can use DTrace.


$ su root -c "ppriv -s EIP+dtrace_kernel,dtrace_proc,dtrace_user $$"
Password:
$ ppriv $$
449:       -zsh
flags = <none>
     E: basic,dtrace_kernel,dtrace_proc,dtrace_user,proc_priocntl
     I: basic,dtrace_kernel,dtrace_proc,dtrace_user,proc_priocntl
     P: basic,dtrace_kernel,dtrace_proc,dtrace_user,proc_priocntl
     L: all
$ dtrace -q -n 'sched:::on-cpu /execname == "spin"/ {@[pid] = count()} tick-5sec { exit(0) }'

   660              103

Yep, just as we expected, process 652 has not been scheduled even once in our sampling period of 5 seconds. It's getting absolutely no CPU time at all.

Just to be sure, let's make the two priorities equal again and check again with DTrace to see that they are being scheduled more evenly.


$ priocntl -s -m 0 -p 0 -i pid 660
$ dtrace -q -n 'sched:::on-cpu /execname == "spin"/ {@[pid] = count()} tick-5sec { exit(0) }'

   660               50
   652               57

So, in summary, processes at the lowest priority level (0 in FX) will be starved of CPU time by anything on the system at a higher priority. Processes at the same priority level can have time apportioned between them using mechanisms such as the quantum.

The interaction between the FX and other scheduling classes becomes more complicated thanks to the appearance of global priorities into the equation, but that's a subject for another post. :)

Qui docet

Thursday, September 07, 2006

"Basement" processes in Solaris

About Me

Links