# In SPSS, why do the SUM() and MEAN() functions keep cases with missing values instead of dropping those observations?

Statistical functions in SPSS [e.g., `SUM()`

,
`MEAN()`

, `SD()`

] perform calculations using all
available cases. SPSS will not automatically drop observations with
missing values, but instead it will exclude cases with missing values
from the calculations. SPSS will correctly estimate the mean with the
`MEAN()`

function by using all non-missing values.

However, problems can arise when trying to exclude missing cases and
estimate results based only on observations with complete
information. For example, suppose two variables (`v1`

and
`v2`

) sum to create an index variable
(`v3`

). While `v1`

has ten valid cases with no
missing values, `v2`

has eight valid cases and two missing
values. Use the following syntax to add the two variables and create
an index, `v3`

:

COMPUTE V3 = SUM(V1, V2). EXECUTE .

The resulting index variable `v3`

has ten cases and no
missing values. When SPSS encounters a missing value in any of the
`v2`

cases, it ignores it and sets `v3`

equal to
`v1`

. Essentially, SPSS treats the missing values of
`v2`

as zeroes. The results can potentially be misleading.

To ensure that `v3`

is equal to the sum of
`v1`

and `v2`

and that all missing cases are
dropped rather than ignored, specify the minimum number of valid cases
that SPSS should use to calculate a given function. For example, to
create an index variable `v3`

using only observations
without missing values, execute the following syntax:

COMPUTE V3 = SUM.2(V1, V2). EXECUTE .

The `.2`

appended to the end of the `SUM`

function in the above example can be any integer. Use it to indicate
the minimum number of valid cases necessary to perform a given
calculation.

