BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Cuneyt
Obsidian | Level 7

Hi All,

 

I have two datasets, one for voters, one for candidates. Each voter and candidate is unique. Each voter and candidate is associated with a unique zip code. There are zip codes with a candidate but no voters. Similarly, there are zip codes with voters but no candidate. I would like to combine the datasets as follows:

 

Voter Dataset     Candiate Dataset     Combined Dataset    
Voter Zip   Candidate Zip   Voter Zip Candidate
1 111   21 111   1 111 21
2 111   22 111   2 111 22
3 111   23 222   3 111 .
4 111   24 222   4 111 .
5 222   25 444   5 222 23
6 222         6 222 24
7 222         7 222 .
8 333         8 333 .
9 333         9 333 .
10 333         10 333 .
            . 444 25

 

 

How can I do this in SAS? Any help will be greatly appreciated.

 

Sincerely,

Cuneyt

1 ACCEPTED SOLUTION
9 REPLIES 9
Reeza
Super User
Interleave

data want;
set voter candidate;
by zip;
run;
Cuneyt
Obsidian | Level 7

Hi Reeza,

 

Thanks for your solution. But your code generate the following desired dataset, which is not what I want. Ideally, I would like C=21 and C=22 moved up to row 1 and 2, C=23 and C=24 moved to row 5 and 6. Is there a way to modify your code to do this?

v zip c
1 111 .
2 111 .
3 111 .
4 111 .
. 111 21
. 111 22
5 222 .
6 222 .
7 222 .
. 222 23
. 222 24
8 333 .
9 333 .
10 333 .
. 444 25
Cuneyt
Obsidian | Level 7

Hi Kurt,

 

Your solution works perfectly! Thank you so much. Just out of curiosity, how does your program work? What does the if statement do? What does the missing routine do?

 

Sincerely,

Cuneyt

Kurt_Bremser
Super User

CALL MISSING sets all variables supplied as arguments to a missing value, regardless of type (you can mix types).

 

of _all_ means that the data step compiler inserts all variables already populated into the Program Data Vector at the time that a particular statement is compiled. So, in order to make of _all_ work, w need to have all variables of the incoming datasets in the PDV, and this is achieved by using a SET statement which is never executed (because If 0 is never true), but forces the data step compiler to insert all variables from these datasets into the PDV.

 

So the program, during compile time, sets all variables from the datasets to missing before reading. This counters the automatic RETAIN of variables read from datasets, and therefore you get a "clean slate" before every data step iteration, and the desired effect of missing values after one of the datasets runs out of observations for a group.

 

But @Tom's method does this in a simpler, clearer way.

But you should still keep the IF 0 THEN SET ... method in the back of your mind, it can also be used to define all variables from datasets which are loaded later into hash objects.

Cuneyt
Obsidian | Level 7

Hi Kurt,

Thanks for the explanation. "If 0" is definitely a Jedi-level SAS programming!

Sincerely,

Cuneyt

Tom
Super User Tom
Super User

I find it is clearer to just add an explicit OUTPUT statement.

data want;
  merge voter candidate;
  by zip;
  output;
  call missing(of _all_);
run;

All variables sourced from datasets are "retained", that is they are not set to missing at the start of a new iteration of the data step.  When a dataset stops contributing to a by group SAS does not read any anything from that dataset, but the values it last read stay. That is why a one to many merge works to copy the values from the one dataset onto all of the matches from the many dataset.  By explicitly setting the values to missing only the the value read during the next iteration will be filled in.

Cuneyt
Obsidian | Level 7

Much clearer! Thanks, Tom!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 966 views
  • 4 likes
  • 4 in conversation