Re: help for newbie please! (trouble with concatenation)

deleted_user · Posted 08-11-2008 04:46 PM

Hi,

I'm just getting started at this, taking the basic certification e-course. I can't figure out one of the questions on the post-test for the segment on combining data. (see below) To me the answer has to be (b)$30,0000, but the quiz score claims it should be (a) missing. I already sent an email to saslearning@sas.com, and they insist the response is correct.Can someone please explain this to me?????

Here's the problem:
If you concatenate the data sets below in the order shown, what is the value of Sale in observation 2 of the new data set?
Sales.Reps
ID Name
1 Nay Rong
2 Kelly Windsor
3 Julio Meraz
4 Richard Krabill
Sales.Close
ID Sale
1 $28,000
2 $30,000
2 $40,000
3 $15,000
3 $20,000
3 $25,000
4 $35,000
Sales.Bonus
ID Bonus
1 $2,000
2 $4,000
3 $3,000
4 $2,500

a. missing
b. $30,000
c. $40,000
d. you cannot concatenate these data sets
Correct answer: a
Your answer: b

The concatenated data sets are read sequentially, in the order in which they are listed in the SET statement. The second observation in Sales.Reps does not contain a value for Sale, so a missing value appears for this variable. (Note that if you merge the data sets, the value of Sale for the second observation is $30,000.)

sbb · Posted 08-11-2008 06:51 PM

If you expect to get an accurate response, it's most important to share your SAS code up-front. And without that, it's conjecture at best, frankly.

Scott Barry
SBBWorks, Inc.

deleted_user · Posted 08-11-2008 07:18 PM

> If you expect to get an accurate response, it's most
> important to share your SAS code up-front. And
> without that, it's conjecture at best, frankly.
>
> Scott Barry
> SBBWorks, Inc.
Dear Scott,

I appreciate your advice, but did you actually read the question?

There is no code provided. I copied the question as it reads on the quiz. It refers to the use of a SET statement to stack or concatenate data sets. (Which I assumed, evidently wrongly, any experienced programmer would know.) However, I assume the code would be:

data sales.something;
set sales.reps sales.close sales.bonus;
run;

Does that help?

Deborah

Cynthia_sas · Posted 08-12-2008 12:12 AM

Hi:
Let's approach this another way. Suppose you have 3 datasets, each with two observations or rows: Datasets A, B, and C. Dataset A has the variable VAR1; dataset B has the variable VAR2 and dataset C has the variable VAR3. If you concatenate those 3 datasets together, the resulting, new, dataset will have 3 columns: VAR1, VAR2 and VAR3.

The final dataset would have 6 total rows (2 rows from each of the input datasets) and 3 columns (one column from each of the input datasets). However, dataset A only had VAR1 -- so for the rows that came from A, only VAR1 would be populated... VAR2 and VAR3 would be missing. In a similar fashion, the rows that came from dataset B would have VAR2 populated, but VAR1 and VAR3 missing; and dataset C would have VAR3 populated, but VAR1 and VAR2 would be missing in the new dataset.

The simple SET statement for concatenation does not do a "look-ahead" to see if you have variables in common in your datasets to be concatenated. It just builds a descriptor portion for the new dataset based on info from the datasets found in the SET statement. If it finds duplicate variable names, that doesn't cause SAS to do anything different that it did in the hypothetical situation -- it's only building a LIST of the variable names. Then, it starts to put rows into the new dataset.

Just as it doesn't matter in the example above that the 3 datasets have no variable in common; in your example, it also does NOT matter that your 3 datasets have the ID variable in common. In a simple concatenation, the rows from the first dataset are plopped into the new file; then the rows from the second dataset; then the rows from the third dataset ... etc, etc.

If you created a single dataset using the data in your example, you would get these results (the dashes are my annotation to show where the concatenation of one file stopped and the next file started):
[pre]
Obs ID Name Sale Bonus

1 1 Nay Rong . .
2 2 Kelly Windsor . .
3 3 Julio Meraz . .
4 4 Richard Krabill . .
----------------------------------------------------------
5 1 28000 .
6 2 30000 .
7 2 40000 .
8 3 15000 .
9 3 20000 .
10 3 25000 .
11 4 35000 .
---------------------------------------------------------
12 1 . 2000
13 2 . 4000
14 3 . 3000
15 4 . 2500
[/pre]

As you can see, it was really coincidence that all 3 of your files had an ID variable in common, but in the case of a concatenation, it didn't matter. Every value for Sale and Bonus is missing for the first 4 obs that came from Sales.Reps.

If you had done a SET with a BY statement or a MERGE with a BY statement, the results would have been different. However, with a simple concatenation, as you can see, ALL the rows from the first table precede ALL the rows from the second table -- as they were listed in the SET statement.

Does that make more sense about why the correct answer was 'missing' for that question???

cynthia

deleted_user · Posted 08-12-2008 12:49 AM

Yes, it does - THANK YOU!!!!!!!!!!

I don't know why I couldn't figure that out on my own, it was right there in front of my nose, but somehow it wasn't getting through. You are truly an angel for taking the trouble to explain it so clearly!

Sincerely,
Deborah

help for newbie please! (trouble with concatenation)