Hi All,
I have two datasets, one for voters, one for candidates. Each voter and candidate is unique. Each voter and candidate is associated with a unique zip code. There are zip codes with a candidate but no voters. Similarly, there are zip codes with voters but no candidate. I would like to combine the datasets as follows:
Voter Dataset | Candiate Dataset | Combined Dataset | ||||||
Voter | Zip | Candidate | Zip | Voter | Zip | Candidate | ||
1 | 111 | 21 | 111 | 1 | 111 | 21 | ||
2 | 111 | 22 | 111 | 2 | 111 | 22 | ||
3 | 111 | 23 | 222 | 3 | 111 | . | ||
4 | 111 | 24 | 222 | 4 | 111 | . | ||
5 | 222 | 25 | 444 | 5 | 222 | 23 | ||
6 | 222 | 6 | 222 | 24 | ||||
7 | 222 | 7 | 222 | . | ||||
8 | 333 | 8 | 333 | . | ||||
9 | 333 | 9 | 333 | . | ||||
10 | 333 | 10 | 333 | . | ||||
. | 444 | 25 |
How can I do this in SAS? Any help will be greatly appreciated.
Sincerely,
Cuneyt
Try this:
data want;
if 0 then set voter candidate; /* defines the PDV */
call missing(of _all_);
merge voter candidate;
by zip;
run;
Hi Reeza,
Thanks for your solution. But your code generate the following desired dataset, which is not what I want. Ideally, I would like C=21 and C=22 moved up to row 1 and 2, C=23 and C=24 moved to row 5 and 6. Is there a way to modify your code to do this?
v | zip | c |
1 | 111 | . |
2 | 111 | . |
3 | 111 | . |
4 | 111 | . |
. | 111 | 21 |
. | 111 | 22 |
5 | 222 | . |
6 | 222 | . |
7 | 222 | . |
. | 222 | 23 |
. | 222 | 24 |
8 | 333 | . |
9 | 333 | . |
10 | 333 | . |
. | 444 | 25 |
Try this:
data want;
if 0 then set voter candidate; /* defines the PDV */
call missing(of _all_);
merge voter candidate;
by zip;
run;
Hi Kurt,
Your solution works perfectly! Thank you so much. Just out of curiosity, how does your program work? What does the if statement do? What does the missing routine do?
Sincerely,
Cuneyt
CALL MISSING sets all variables supplied as arguments to a missing value, regardless of type (you can mix types).
of _all_ means that the data step compiler inserts all variables already populated into the Program Data Vector at the time that a particular statement is compiled. So, in order to make of _all_ work, w need to have all variables of the incoming datasets in the PDV, and this is achieved by using a SET statement which is never executed (because If 0 is never true), but forces the data step compiler to insert all variables from these datasets into the PDV.
So the program, during compile time, sets all variables from the datasets to missing before reading. This counters the automatic RETAIN of variables read from datasets, and therefore you get a "clean slate" before every data step iteration, and the desired effect of missing values after one of the datasets runs out of observations for a group.
But @Tom's method does this in a simpler, clearer way.
But you should still keep the IF 0 THEN SET ... method in the back of your mind, it can also be used to define all variables from datasets which are loaded later into hash objects.
Hi Kurt,
Thanks for the explanation. "If 0" is definitely a Jedi-level SAS programming!
Sincerely,
Cuneyt
I learned that trick here. The longer you stay around, the deeper your toolbox will be.
I find it is clearer to just add an explicit OUTPUT statement.
data want;
merge voter candidate;
by zip;
output;
call missing(of _all_);
run;
All variables sourced from datasets are "retained", that is they are not set to missing at the start of a new iteration of the data step. When a dataset stops contributing to a by group SAS does not read any anything from that dataset, but the values it last read stay. That is why a one to many merge works to copy the values from the one dataset onto all of the matches from the many dataset. By explicitly setting the values to missing only the the value read during the next iteration will be filled in.
Much clearer! Thanks, Tom!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.