Queueing Theory

Simplifying assumptions:

Definitions

Little's Law

Defines a very general relationship for any system, and is useful for understanding the relationship between throughput, response time, and the number of tasks in the system.

$$N = X \cdot R$$

eg. If the throughput is 10 tasks per second, and the average response time is 5 seconds, then there are 50 tasks in the system.

Examples

Server than processes requests sequentually. The average arrival and departure rate is 100 requests/sec. The average request completes after 5 ms. What is the average utilization $U$ of the system?

$$U = X \cdot R = 100 \cdot 0.005 = 0.5$$

Meaning the server is busy 50% of the time.

Web service that takes an average of 100 ms to process a request, and handles 10000 queries per second. What is the average number of requests in the system?

$$N = X \cdot R = 10000 \cdot 0.1 = 1000 \text{ queries}$$

Response Time vs. Utilization

Operating a system with high utilization increases risk of overload. If $\lambda > \mu$, the queue will grow indefinitely, and so will the response time.

Higher arrival rate $\lambda$ and burstier arrival process will tend to yield longer queue lengths.

Best Case: Uniform Arrival

Worst Case: Bursty Arrival

When requests arrive in groups (which they often do), the queue will grow and shrink in response to the bursts. This can lead to a higher average queue length and response time. Even if the average arrival rate is less than the service rate, the queue can still grow if the arrival process is bursty.

For example, consider the following two systems:

System 1 processes requests as they come in, with a queue length of 0. The average response time over a 10 second period is 1 second.

$$ R = \frac{1}{10} \sum_{i=1}^{10} 1 = 1 \text{ seconds} $$

System 2 processes requests in bursts, with a queue length that grows and shrinks. The average response time is...

$$R = \frac{1}{10} \sum_{i=1}^{10} i = 5.5 \text{ seconds}$$

Generally, for a system with arrival rate $\lambda = \frac{n}{t}$ which is a bursty arrival process of $n$ requests every $t$ seconds, and a service time of $S$ seconds, the average response time for each request in the burst is...

$$ R = \frac{1}{n} \sum_{i=1}^{n} i \cdot S $$

Exponential Arrivals

An exponential distribution of a continuous random variable has a mean $\lambda^{-1}$, and a variance $\lambda^{-2}$. Its probability density function is...

$$ f(x) = \lambda e^{-\lambda x} $$

It is a memoryless distribution, meaning that the probability of an event occurring in the next $t$ seconds is the same as the probability of an event occurring in the next $t + s$ seconds.

A memoryless distribution is useful because it allows us to model a queue as a finite state machine with states corresponding to the number of tasks in the queue, and transition probablilities $\lambda$ and $\mu$ for transitioning up a state (arrival) and down a state (departure).

Assuming $\lambda < \mu$, the system is stable and we can calculate the response time $R$ as a function of utilization $U$ and service time $S$:

$$ R = \frac{S}{1 - U} $$

This shows the exponential relationship between response time and utilization. As utilization increases, response time increases very slowly at first, but then increases rapidly as the system approaches overload.