When calculating confidence intervals for the average of a set of data values, it is assumed that the data values are *independent and identically distributed (IID)*. Intuitively, data values are IID if they are not related to each other and if they have the same probability distribution. Accurate confidence intervals can only be calculated for IID data values.

The data values that are collected by Data collector monitors may not be IID, but Simulation replications can be used to generate IID estimates of performance measures. For more information, see below.

The data that is collected by a data collector is the values returned by the observation, initialization, and stop functions for the monitor, i.e. the values x_i discussed on the help page for Calculating statistics.

It is important to note that the data values that are collected for a particular data collector monitor during a simulation are **not** necessarily *independent data values*. Consider, for example, a data collector that measures the amount of time a packet waits in a queue. If packet_i is in the queue when packet_i+1 is added to the queue, then the waiting time for packet_i+1 will depend, at least in part, on the waiting time for packet_i. In this case the waiting time for packets i and i+1 are *not* independent.

Similarly, the data values that are collected by a data collector are **not** necessarily *identically distributed*, i.e. they may not be random samples from the same probability distribution function. For example, a queue of packets waiting to be sent may be very short at the beginning of a simulation, which means that the waiting times for the first packets that pass through the queue are likely to be fairly small. However, towards the end of the simulation, the queue of packets may be very long, in which case, the waiting times for the last packets to be removed from the queue are likely to be large. In such a situation, the waiting times for the packets at the start of the simulation will probably *not* come from the same probability distribution function as the waiting times for the packets near the end of the simulation.

Whether or not data values are *independent and identically distributed (IID)*, will effect the accuracy of the confidence intervals that are calculated for a data collector at the end of a simulation. When confidence intervals are calculated for the average of a number of data values, it is assumed that the values are IID. If the values are not IID, then the confidence intervals may be inaccurate, and in particular, they may be too short. Currently, no attempt is made to investigate whether data values for a given data collector are IID during a single simulation.

IID estimates of performance measures can be collected from Simulation replications. At the end of a single simulation, the Data collector functions can be used to access statistics for a given data collector. When running simulation replications, it is possible to collect a number of IID estimates for a particular statistic for a particular data collector.

Consider, for example, a data collector named `Queue_Delay`

that measures the amount of time that objects spend waiting in a queue in a net. At the end of a simulation, the Data collector functions `Queue_Delay.avrg()`

, can be used to access the average queue delay for the objects during that simulation. Running one simulation will result in one estimate of the average queue delay for objects. If another simulation is run, then it will most likely result in a
different estimate of the average queue delay. Running
Simulation replications will provide a number of IID estimates of the average queue delay for objects.

Below are excerpts from three simulation performance reports. The three estimates of average queue delay are IID, and they can be used to calculate accurate confidence intervals.

The confidence intervals that are shown in replication performance reports and confidence interval files are calculated for values that are IID.