Solve this multiple-choice question and choose the correct option.

In an EMP table, the SQL statement SELECT AVG(SALARY) FROM EMP seems to yield inaccurate results. What is a common reason for this behaviour in relational databases?

Difficulty: Medium

SALARY is defined as an integer or small numeric type, so the average is computed with integer arithmetic and fractional parts are truncated or rounded
AVG is designed to ignore all nonzero values and only average zeros, which rarely matches business expectations
The presence of a primary key on EMP prevents AVG from seeing all rows, so it samples only a subset of data
SQL engines always calculate AVG with random rounding errors, making the function unsuitable for numeric data

Correct Answer: SALARY is defined as an integer or small numeric type, so the average is computed with integer arithmetic and fractional parts are truncated or rounded

Explanation:

Introduction / Context:
A frequent interview scenario involves a query such as SELECT AVG(SALARY) FROM EMP returning what appears to be an inaccurate result. This question probes whether you understand data types, aggregate functions, and how databases perform arithmetic. The issue often arises when numeric columns are defined with insufficient precision or as integers, leading to loss of fractional information during calculations.

Given Data / Assumptions:

There is an EMP table with a SALARY column.
The query SELECT AVG(SALARY) FROM EMP is executed.
The average shown does not match the expected value based on manual calculations.
We assume no rows are being filtered out by WHERE or HAVING.

Concept / Approach:
The AVG function returns the average value of a numeric expression. However, the way it calculates depends on the data type of the input column. If SALARY is defined as an integer or as a numeric with zero decimal places, the database may perform integer arithmetic and then return a numeric with the same scale. This can cause truncation or rounding of fractional parts in intermediate results, making the average look inaccurate compared to a calculation done with higher precision. Null values are ignored by AVG, which is usually correct, but the main source of surprise is the data type and scale, not the function itself.

Step-by-Step Solution:
Step 1: Examine the definition of the SALARY column, for example SALARY INTEGER or SALARY DECIMAL(8,0) versus SALARY DECIMAL(8,2). Step 2: Recognise that if SALARY is stored without decimals, each value has already lost any fractional cents or paise; this truncation affects the underlying data before averaging. Step 3: Understand that when AVG operates on an integer or zero scale numeric, the intermediate computations may also follow integer rules and the final result may retain zero decimal places. Step 4: Compare the result of SELECT AVG(SALARY) FROM EMP with SELECT AVG(CAST(SALARY AS DECIMAL(10,2))) FROM EMP or an equivalent cast to higher precision. Step 5: Observe that the casted version often produces a more precise average that aligns with expectations, revealing that the perceived inaccuracy was due to integer arithmetic and lack of scale.

Verification / Alternative check:
To verify, you can compute the average salary manually using a calculator with full precision and then compare it to the database results. If the SALARY column is integer based, the AVG result will match the average of the already truncated values, not necessarily the exact business average you expect. Casting SALARY to a higher precision decimal before averaging usually eliminates the discrepancy. Documentation for your database will describe how AVG determines the return type based on input types, confirming that data type and scale are the key factors.

Why Other Options Are Wrong:
Option B is incorrect because AVG does not ignore nonzero values; it averages all non null values. Option C is wrong because primary keys do not restrict aggregate functions from seeing rows; they simply define uniqueness for keys. Option D is incorrect because although floating point arithmetic can have rounding behaviour, standard numeric types and aggregates are designed to produce consistent results; random errors are not the cause of the perceived inaccuracy in this simple case.

Common Pitfalls:
Developers often overlook column definitions and assume that all numeric arithmetic is done with high precision. They may store monetary amounts as integers and then expect averages with decimals, or define DECIMAL types with insufficient scale. Another pitfall is ignoring the effect of nulls, which AVG correctly skips, leading to differences compared to dividing by the total number of rows including those with unknown salaries. Best practice is to choose appropriate data types for monetary and precise numeric data and to be explicit with casts when necessary.

Final Answer:
A common reason for apparently inaccurate results is that SALARY is defined as an integer or a numeric with zero decimal places, so AVG performs integer arithmetic and returns a value with truncated or rounded fractional parts, which differs from a high precision manual calculation.

Discussion & Comments

No comments yet. Be the first to comment!

In an EMP table, the SQL statement SELECT AVG(SALARY) FROM EMP seems to yield inaccurate results. What is a common reason for this behaviour in relational databases?

More Questions from Technology

Discussion & Comments