Calculating statistics

Data collector monitors extract numerical data from a net during simulations. The numerical data is extracted by the observation, initialization, and stop Data Collector Monitoring Functions. The numerical data that is extracted is used to calculate statistics. The statistics that are calculated for a particular data collector will be either untimed statistics or timed statistics (see below for more details).

The statistics that can be accessed from each data collector monitor are:

  • count (number of data observations),
  • minimum,
  • maximum,
  • sum,
  • average,
  • confidence intervals for average ,
  • variance,
  • standard deviation,
  • sum of squares,
  • sum of squares of deviation,
  • first value observed, and
  • last (i.e., most recent) value observed.

If timed statistics are calculated for a data collector monitor, then the following additional statistics are calculated:

  • time of first update,
  • time of last update, and
  • time interval (amount of model time that has elapsed since the data collector was first updated).

There is support for calculating 90%, 95%, and 99% confidence intervals for averages. One of the Performance options functions can be used to select which confidence interval levels should be calculated. Note that the confidence intervals for the average of the data values collected by a data collector monitor will be accurate only if the data values are independent and identically distributed (IID); see Independent and identically distributed values.

All of the statistics mentioned above can be accessed using the Data collector functions.

In the following, let x_i, i=1..n, be the values that are returned by the observation, initialization, and stop functions for a data collector monitor.

Untimed statistics

If untimed statistics are to be calculated for the data collector, then the sum and average of n values are calculated in the following way:

Sum_n = x_1 + x_2 + ... + x_n

Avrg_n = Sum_n / n

The remaining statistics are calculated in a similar way.

If a data collector observes the same value twice, then the value influences the statistics twice, as expected.

The following figure shows an example of data values that are used to calculate untimed statistics.

Data for untimed statistics

The data values in the figure above are:

#x_i i
 0   1
 1   2
 0   3
 1   4
 1   5
 2   6
 1   7
 0   8
 0   9
 1  10
 0  11
 1  12
 1  13
 0  14
 0  15

For these values sum=9 and avrg=0.6.

Timed statistics

Timed statistics differ from untimed statistics in that an interval of time is used to weight each observed value. The figure below shows an example of the intervals of time that are associated with observed data values. The line segment after an observed value corresponds to the interval of time that is used to weight the observed value.

Data for timed statistics

Assume that data value x_i is extracted at time t_i, for i=1..n. The interval [t_i,t_i+1] is used to weight the value x_i; that is, the weight of the value x_i is (t_i+1 - t_i). At precisely time t_i, variable x_i has no influence on the following statistics:

  • sum,
  • average,
  • sum of squares,
  • sum of the squares of deviation,
  • standard deviation, and
  • variance.

This is due to the fact that the weight of the value is zero, but for all time t>t_i, x_i will influence these values.

The (timed) sum and (timed) average of the n values at time t>=t_n are calculated as follows:

Sum_t = x_1*(t_2-t_1) + x_2*(t_3-t_2) + ... + x_n*(t-t_n)

Avrg_t = Sum_t/(t-t_1)

With timed statistics it is possible for a value to exist for zero time. In the last figure, the second observation of value 2 exists for zero time, as indicated by a missing line segment after the data value.

In contrast to the statistics mentioned previously, the following statistics take into account all data values, including those that are weighted with zero time, observed by the data collection:

  • minimum,
  • maximum, and
  • count.

The data values, time of observation, and time intervals for the previous figure are:

#xi ti interval
  0  3       17
  1 20        2
  0 22        5
  1 27        9
  2 36        3
  1 39        4
  0 43        6
  1 49        2
  2 51        0
  1 51        2
  0 53       14

For these values at time t=67, timed sum=25 and timed avrg=0.390625.

Related pages