SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
Baby_Cow96
Calcite | Level 5

Hello Everyone, 

 

I'm pretty stuck on a question in my practice exam any help explaining it would be very much appreciated.

 

I've been given 2 data sets, cert.input08a and cert.input08b. Both data sets contain a common numeric variable named ID.

I’m not sure how else to provide the data but the variables in cert.input08a are:

 

exa1 through exa 22 (i.e. exa1, exa2, exa3, etc), ValA, J, ID

 

While cert.input08b has the variables:

 

exb1 through exb45 (i.e. exb1, exb2, exb3, etc), excess1 through excess45 (i.e. excess1, excess2, excess3, etc), ValB, X, ID

 

With this I've been tasked to write a program that uses a SAS Data Step to:

-Combine the 2 tables by matching values of the ID variable.

-Write only the observations that are in both data sets to a new set named results.match08.

-Write all other non-matching observations from either data set to a new set named results.nomatch08.

-Exclude all variables that begin with "ex" from results.nomatch08.

 

I started by sorting both the data sets by the common variable:

 

proc sort data = cert.input08a out=sorted_input08a ;
    by ID ;
run ;

 

proc sort data = cert.input08b out=sorted_input08b ;
   by ID ;
run ;

 

Then I wrote a data step to merge them using the now sorted tables:

 

data results.match08 results.nomatch08 ;
      merge sorted_input08a (in=inPut8a)
                 sorted_input08b (in=inPut8b) ;
      by ID;
if inPut8a = 1 and inPut8b = 1 then output results.match08 ;
else output results.nomatch08 ;
run ;

 

When I run this code (the data step, after running the sort steps) I receive no error in the log, but I do get note that isn't familiar, "MERGE statement has more than one data set with repeats of BY values. The output data for results.nomatch08 is 2 rows 117 columns, while results.match08 is 1200 rows and 117 columnns. input08a.sas7bdat originally has 1200 rows and 25 columns. input08b.sas7bdat originally has 1202 rows and 93 columns. 

 

As well the only matching variable from the two data sets are ID, so shouldn't the new data set results.match08 only have the variable ID as a column?

 

One last question, for exlcuding variables that begin with "ex" from the set results.nomatch08 do you use a drop = option?

 

data results.match08 results.nomatch08 (drop = "ex%") ; (like this?)

 

Thanks in advance for the help, 

Alexander

 

 

 

 

5 REPLIES 5
Astounding
PROC Star

In terms of the programming, you are most of the way there.  The DROP condition would be somewhat different:

 

(drop = ex: )

 

In terms of the NOTE on the log, you will need to investigate.  This is telling you that ID is not unique.  Perhaps both data sets contain 3 observations for the same ID.  SAS is alerting you to the fact, because MERGE might not handle the situation that way you would like.  To begin, however, it is up to you to investigate both data sets and find out where there are duplicate values for the same ID.

 

ballardw
Super User

You may want to look closely at the resulting set of running this code. Specifically the 3 observation in work.datamerge to see what may happen if you have multiple values of the BY variable in two different sets and the number of observations in the by group does not match.

data work.data1;
   input x y;
datalines;
1  2
1  3
1  4
2  1
2  2
;

data work.data2;
   input x z;
datalines;
1  22
1  32
2  12
2  22
;

data work.datamerge;
   merge work.data1
         work.data2
   ;
   by x;
run;

And then compare the results below where only one set has multiple values of the By variable.

data work.data3;
   input x q;
datalines;
1 444
2 555
;

data work.datamerge2;
   merge work.data1
         work.data3
   ;
   by x;
run;
sm12
Calcite | Level 5
data results.match08 results.nomatch08(DROP=EX:);
merge results.input08a(in=input08a)
results.input08b(in=input08b);
by id ;
if input08a=1 and input08b=1 then output results.match08;
else output results.nomatch08 ;
run;
Cynthia_sas
SAS Super FREQ

Hi:

  Please don't overthink the question. The NOTE in the log about duplicate BY values is OK. You don't have to fix the data or adjust your code. Here's what I mean. In my code below, I did NOT use the RESULTS library, I just wrote to the WORK library for the program to make sure I get the right answers before I save the program:

Cynthia_sas_0-1691093958141.png

All you have to do is write your program and review the results. When I write the program the correct answers are contained in the SAS log. Remember that if you drop ALL the variables that start with EX:, then you'll drop ALL the variables from input08a that start with EX -- PLUS -- all the variables from input08b that start with EX in order to create the NOMATCH file. But you'll keep them for the MATCH file from the MERGE.

 

  Here's what I mean:

Cynthia_sas_1-1691095536869.png

The number of observations or rows is also easy to figure out. There are 1200 rows or obs in input08a and 1202 rows in input08b. There are 2 rows where ID = 401 that are in input08b but not in input08a. Are there duplicates of the BY values, yes there are. But that scenario, only resulted in a NOTE. And while it is a note that can adversely impact your output, in this case although there are duplicates, they do match in both files until you get to ID=401. So the rows for 401 are the only non-matches in this scenario, which is why you have 1200 obs in the match file and only 2 obs in the nomatch file.

  I hope this helps to explain this question a bit more.

Cynthia

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 5672 views
  • 6 likes
  • 6 in conversation