BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Cuneyt
Fluorite | Level 6

Hi All,

 

I have two datasets, one for voters, one for candidates. Each voter and candidate is unique. Each voter and candidate is associated with a unique zip code. There are zip codes with a candidate but no voters. Similarly, there are zip codes with voters but no candidate. I would like to combine the datasets as follows:

 

Voter Dataset     Candiate Dataset     Combined Dataset    
Voter Zip   Candidate Zip   Voter Zip Candidate
1 111   21 111   1 111 21
2 111   22 111   2 111 22
3 111   23 222   3 111 .
4 111   24 222   4 111 .
5 222   25 444   5 222 23
6 222         6 222 24
7 222         7 222 .
8 333         8 333 .
9 333         9 333 .
10 333         10 333 .
            . 444 25

 

 

How can I do this in SAS? Any help will be greatly appreciated.

 

Sincerely,

Cuneyt

1 ACCEPTED SOLUTION
9 REPLIES 9
Reeza
Super User
Interleave

data want;
set voter candidate;
by zip;
run;
Cuneyt
Fluorite | Level 6

Hi Reeza,

 

Thanks for your solution. But your code generate the following desired dataset, which is not what I want. Ideally, I would like C=21 and C=22 moved up to row 1 and 2, C=23 and C=24 moved to row 5 and 6. Is there a way to modify your code to do this?

v zip c
1 111 .
2 111 .
3 111 .
4 111 .
. 111 21
. 111 22
5 222 .
6 222 .
7 222 .
. 222 23
. 222 24
8 333 .
9 333 .
10 333 .
. 444 25
Cuneyt
Fluorite | Level 6

Hi Kurt,

 

Your solution works perfectly! Thank you so much. Just out of curiosity, how does your program work? What does the if statement do? What does the missing routine do?

 

Sincerely,

Cuneyt

Kurt_Bremser
Super User

CALL MISSING sets all variables supplied as arguments to a missing value, regardless of type (you can mix types).

 

of _all_ means that the data step compiler inserts all variables already populated into the Program Data Vector at the time that a particular statement is compiled. So, in order to make of _all_ work, w need to have all variables of the incoming datasets in the PDV, and this is achieved by using a SET statement which is never executed (because If 0 is never true), but forces the data step compiler to insert all variables from these datasets into the PDV.

 

So the program, during compile time, sets all variables from the datasets to missing before reading. This counters the automatic RETAIN of variables read from datasets, and therefore you get a "clean slate" before every data step iteration, and the desired effect of missing values after one of the datasets runs out of observations for a group.

 

But @Tom's method does this in a simpler, clearer way.

But you should still keep the IF 0 THEN SET ... method in the back of your mind, it can also be used to define all variables from datasets which are loaded later into hash objects.

Cuneyt
Fluorite | Level 6

Hi Kurt,

Thanks for the explanation. "If 0" is definitely a Jedi-level SAS programming!

Sincerely,

Cuneyt

Tom
Super User Tom
Super User

I find it is clearer to just add an explicit OUTPUT statement.

data want;
  merge voter candidate;
  by zip;
  output;
  call missing(of _all_);
run;

All variables sourced from datasets are "retained", that is they are not set to missing at the start of a new iteration of the data step.  When a dataset stops contributing to a by group SAS does not read any anything from that dataset, but the values it last read stay. That is why a one to many merge works to copy the values from the one dataset onto all of the matches from the many dataset.  By explicitly setting the values to missing only the the value read during the next iteration will be filled in.

Cuneyt
Fluorite | Level 6

Much clearer! Thanks, Tom!

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 296 views
  • 4 likes
  • 4 in conversation