22 July 2011

Valid partial step in optimizing code? (4482)

In the Q2 2011 playoff, one of the questions examined the nuances of using BULK COLLECT and FORALL to optimize one's code. We asked players "Which of the statements below describe a complete or partial step towards improving the performance of this program, while keeping intact all current functionality?"

And we scored as incorrect this choice:

"Keep the cursor FOR loop in place (they are automatically optimized by Oracle into BULK COLLECT levels of performance). Simply replace the two DML statements with FORALL statements within that cursor FOR loop."

A player objected, feeling that this should be scored as correct, arguing:

"I checked this answer as correct because, as another player mentioned, the question did ask “complete or partial step towards improving the performance of this program”. It is not a solution, of course, because there needs to be a bulk collect off the cursor to create the collection(s) necessary for a forall statement to work. Of course, that would also be a “step”. But this answer also is a “partial step towards improving the performance”. I think that this answer should be scored as correct. I interpreted the word “step” to mean “one of things that I would need to do to improve the performance”. To make this answer incorrect, I would have added the following clarifying language to the question: “and resulting in a working program that compiles without error”. But the English language is much more ambiguous than the PL/SQL language."

Another player wrote to say about two other of the choices:

"I have to disagree with the answers to this question. The question clearly states "Which of the statements below describe a complete OR PARTIAL step towards improving the performance of this program, while keeping intact all current functionality?" (emphasis mine). Answers that describe a reasonable step that the programmer may take, even if they are not complete, should still be marked correct in my opinion. For example: "Replace the two DML statements (insert and update) with one FORALL statement that executes those same two statements, using data in collections populated by a BULK COLLECT." - you object to this answer because it doesn't completely fulfill the requirements. But a FORALL statement could most definitely be used to improve the performance - sure, it doesn't completely fulfill the requirements, the programmer has to do more than that, but it is still a partial step towards improving the program. Similarlarly, "Add a LOG ERRORS to both DML statements and put them inside FORALL statements." - this is a good partial step. All I have to do is add some code to raise an exception if I find an error in SQL%BULK_EXCEPTIONS. The other answers were steps in the *wrong* direction, and were correctly marked as wrong for that reason."

My responses:

First, I recognize the problematic aspect of including "or partial" in our question text. We actually included it so that a player couldn't argue that a single choice was or wasn't a complete solution. Ah, well...

Still, I am ready to defend with full vigor my scoring. Here goes:

"Keep the cursor FOR loop in place (they are automatically optimized by Oracle into BULK COLLECT levels of performance). Simply replace the two DML statements with FORALL statements within that cursor FOR loop." [7281]

It sounds like this rationale for marking this as correct is:

Sure, Steven, you say you will keep the cursor for loop "in place" but that's just for now. Later you will come back and change it to a bulk collect. So it is a partial step.

I certainly do understand that rationale. But I also find myself objecting to it. Why? Because the language of the choice implies (strongly, to my mind) that this is not simply a transitional step. It is a misunderstanding of the technology. Compare the text of answer 7281 to this:

Change the two DML statements into FORALL statements within the cursor FOR loop.

This actually still makes little sense (what collection would "feed" the FORALL statement?), but one could argue that the bulk collect step is "next." OK.

But in 7281, I explicitly say I will keep the cursor for loop "in place" and even include a parenthetical clause explaining why - this sentence is saying "This developer thinks that because the cursor FOR loop is auto-optimized, there is no need to change it. Then I include "Simply" in the second sentence to imply that I will do nothing more.

I realize that I am now having a discussion about English syntax and semantics, not PL/SQL code per se. But that is certainly the rationale for why I wrote it that way. I find it hard to read that choice and think to myself, "Oh, that is just the first step in a series planned by this person."

On to the others:

Replace the two DML statements (insert and update) with one FORALL statement that executes those same two statements, using data in collections populated by a BULK COLLECT. [7282]

I don't see how this could be interpreted as a partial/transitional step towards an improved program. You absolutely cannot replace these two DML statements with a single FORALL statement. Again, that reveals a misunderstanding of FORALL. You can only have one DML statement in each FORALL. So how can this be a partial step? You'd have to undo everything you do in this step to get it to work!

Add a LOG ERRORS to both DML statements and put them inside FORALL statements. [7286]

Again, I don't see how to recognize this as a valid step towards a properly refactored program. You cannot use LOG ERRORS without changing the behavior of the program. LOG ERRORS suppresses errors at the row level, and we need to do statement-level suppression of exceptions. You certainly can't add code using SQL%BULK_EXCEPTIONS, because when you use LOG ERRORS, no exceptions are raised at all. So that pseudo collection will remain empty.

Your thoughts?

21 July 2011

Special $49.95 One-month Subscripton PL/SQL Channel

I announced availability of the PL/SQL Channel several months ago: 27+ hours of recorded trainings on the Oracle PL/SQL language, available at the PL/SQL Channel and covering many (but admittedly not yet all) features of the PL/SQL language. At $395 per year per developer for unlimited access to all this content, the PL/SQL Channel is the most cost-effective use of your training dollars for PL/SQL developers, with much greater short- and long-term impact than one-time, instructor-led trainings.

I have also found, however, that many developers (or their managers) are reluctant to pay a subscription fee for a whole year, when their training budgets are very tight and they are not sure they will be using the PL/SQL Channel videos for such an extended length of time.

We have decided, therefore, to offer access to the PL/SQL Channel for 30 days for just $49.95. Yes, that's right - for less than $2 per day you can access dozens of PL/SQL videos with detailed trainings, plus their supporting presentation documents and code. This special price will be available through the end of September 2011.
 
We are also continuing our free, 30 day trial of the PL/SQL Channel (limited, however, to viewing of just five videos).

To start your free trial of the PL/SQL Channel, click here.

To order your $49.95 subscription to the PL/SQL Channel, click here.

Note: the thirty-day period starts on the day you purchase your subscription.

18 July 2011

Tricky choice tricks the quiz author! (4720)

In the logic puzzle for the week of 9 July, we scored the following choice as correct:

If 1 is in the solution then the other numbers must be (2, 4, 7).

The explanation given was this: "If 1 is in the solution, then the third turn tells us that (3, 6, 5) cannot be in the solution. The only 3 numbers left are (2, 4, 7) and looking over the first two turns, they would be compatible. So the four numbers (1, 2, 4, 7) could be in the solution together."

The third turn showed that (3 1 6 5) had just one correct digit, not in the right location.

As two players pointed out to me, the third turn shows that (2 4 7) have to be in the solution, regardless of the status of 1. We need four digits, the third turn tells us that at most one of (1 3 6 5) are in the solution, so we need all of the other three digits (2 4 7).

Thus, 2, 4 and 7 must be in the solution - but that fact has nothing to do with whether or not 1 is in the solution.

In other words, even if 1 is not in the solution, 2, 4 and 7 must still be in the solution.

So what's a poor quiz administrator to do about this? I was at first tempted to do what I usually do: give everyone credit for a correct answer to this choice, and change the text to get rid of this "problem."

But then I thought: wait a minute, the whole point of this puzzler is to test our facility with logical thinking. Clearly, I have demonstrated that I am not as good at this sort of thing as I thought (and by the way, 62% of all players selected this choice as correct).

So it seems to me that what I should do is (a) change this choice to be incorrect; (b) recalculate the scores of all players; (c) change the difficulty level of this quiz to Advanced.

What do you think, dear players?

17 July 2011

Column headers "fair game" for choices? (4180)

The SQL quiz on PIVOT for the week of 9 July asked which of the choices produce this desired output:
DEPARTMENT_ID       LESS       MORE
------------- ---------- ----------
          100          2          0
          200          2          1
And we scored a choice as incorrect simply because the column headers would not match what is shown above. Specifically, the headers would be:
DEPARTMENT_ID     'LESS'     'MORE'
A player raised a concern about this as follows:

I just answered today's SQL quiz a few seconds ago, and it looks to me, as far as I can remember, that something is missing from the general assumptions for the SQL quizzes, namely, a clear statement regarding the column headers. It would be very sad if those that perfectly master the PIVOT clause (including the alias usage for the column headers in this context), and have answered correctly today's quiz would be finally penalized because of eventually not accepting the quoted string as a correct column header, while the query result is the correct one, including the ordering. While from the previous quizzes we already learnt "silently" that the ORDER BY does matter, that is, a correct solution is supposed to produce the same ordering as presented in the sample shown in the quiz (ordering being always an intrinsic part of the SQL), the column headers issue is a little bit more "volatile", as, what we see it is not always strictly a part of the SQL statement itself ( for example, columns headings can be set by the client tool used, like the COLUMN statement in SQL*PLUS, a.s.o. ). In my opinion, the correctness of a query result set should not be made dependent on the column headers, though others might think otherwise. Regardless of the very specific case of the PIVOT clause, and maybe exactly because of it, and since our competition is still at the beginning, maybe a clarification on this issue in the assumptions would be most welcome and would remove ambiguities.

The author of the quiz offers this response:

We could perhaps make the assumptions a little bit clearer – but the assumptions should never become a multipage description that no one will read J.

I can understand a wish for the assumptions to make everything 110% unambiguous – but the danger is if we keep making the assumptions more and more detailed and larger and larger, then practically every tiny objection could argue that it should be part of the assumptions, and then the assumptions would become a document as large as the SQL documentation J. The more we add to assumptions, the less a player will feel that he/she has to use “common sense”…

The reason I included the “un-aliased” answer in this PIVOT quiz (and scored it false) was to enhance awareness of proper column naming. If for example this SQL statement were to be used in some client environment where the client code references columns of the resultset by name, then you would get errors if you didn’t alias. Example: IF (Resultset("LESS") > 1) {do something} would fail if the column was not aliased. So actually the correctness IMHO does depend on the headers as well as the correct data – if you are a “back-end” developer, how you name columns in your output will matter to the front-end developer who has to use your result sets in his or her code.

I would argue against rescoring this particular quiz – it was very much on purpose I provided a choice that failed only in the headers.

My (Steven) feeling about this: generally, we should not make choice correctness dependent on issues that are irrelevant to an understanding of the technology or how it plays out in our day to day experience. In this case, however, Kim argues persuasively that it was quite relevant and something for developers to be aware of.

What do you think?