Difficulty: Medium
Correct Answer: Because SELECT * makes the program fragile by depending on all columns in table order, increasing coupling to schema changes and often fetching more data than needed
Explanation:
Introduction / Context:
In embedded SQL programming, such as COBOL-DB2, the use of SELECT * is strongly discouraged. Interviewers often ask why, because the answer highlights best practices in coupling, performance, and maintainability. While SELECT * may be convenient for quick ad hoc queries, production programs need more explicit and stable definitions of the data they retrieve.
Given Data / Assumptions:
Concept / Approach:
Using SELECT * returns all columns of a table in the order defined in the catalog. In embedded SQL, host variables are mapped to the columns in positional order. If the table definition changes, for example by adding or reordering columns, the mapping between columns and host variables can break silently or cause run time errors. SELECT * also tends to fetch unnecessary data, which increases I/O and network overhead. For these reasons, best practice is to explicitly list only the columns the program actually needs, preserving predictable order and reducing coupling to schema changes.
Step-by-Step Solution:
Step 1: Recognise that in embedded SQL, each SELECT statement normally has a corresponding set of host variables defined in the host language program.
Step 2: When SELECT * is used, the compiler or precompiler expects host variables for every column in the table, in the catalog order, which can be long and fragile.
Step 3: If at a later time a DBA adds a new column to the table or reorders columns, SELECT * will start returning a different layout, misaligning data and host variables.
Step 4: This misalignment can cause data to be placed into the wrong fields, raise conversion errors, or require expensive recompilations, making maintenance difficult.
Step 5: By explicitly listing only the required columns, you stabilise the mapping and reduce the volume of data fetched, improving both reliability and performance.
Verification / Alternative check:
You can verify the risk by creating a test program using SELECT * on a small table, compiling it, then adding a new column to the table and running the program again. In many environments, the program will either fail or produce incorrect field mappings, demonstrating the danger of relying on implicit column orders. If you instead change the table while the SELECT statement specifies explicit column names, the compiler will flag mismatches during recompilation, prompting you to adjust host variables deliberately and safely.
Why Other Options Are Wrong:
Option B is incorrect because SELECT * is syntactically valid in embedded SQL, although it is poor practice. Option C is wrong because locking behaviour depends on transaction isolation, access paths, and predicates, not on whether star notation is used. Option D is incorrect because SELECT * does not automatically convert numeric columns to character; type conversions are governed by column types and host variable definitions, not by star expansion.
Common Pitfalls:
A common pitfall is prioritising convenience over maintainability by using SELECT * during development and never refactoring it. This can lead to subtle bugs when the table evolves over time. Another issue is performance: fetching unused columns wastes CPU, memory, and I/O, especially when large objects or long text columns are involved. Good embedded SQL practice involves selecting only what you need and documenting that selection in the code, which makes programs more robust in the face of schema changes.
Final Answer:
Using SELECT * is not preferred because it tightly couples the program to the full table structure and column order, making it fragile under schema changes and often causing unnecessary data to be fetched, whereas explicitly listing needed columns is safer and more efficient.
Discussion & Comments