Hi,
I have a dataset which contains unique records based on a client id. For example:
Client ID | Location | Employment Status | Year |
1 | Metro | Unemployed | 2001 |
1 | Remote | Employed | 2002 |
1 | Metro | Employed | 2003 |
2 | Metro | Employed | 2003 |
2 | Metro | Employed | 2003 |
3 | Remote | Employed | 2003 |
3 | Remote | Employed | 2003 |
3 | Remote | Employed | 2003 |
Now,using client id = 1 as the example, it shows a discrepancy in the outputs for Location and Employment Status. What I am trying to achieve is a consistent output for these variables based on some conditions. The first condition would be that the client id has to be consistent (i.e. so that it is only looking at records that contain that particular client id. The second condition would be simply be "if any of the outcomes for Location = Remote, then all outcomes for this particular client id would = remote (i.e. the 2 outcomes that = "metro" would be overwritten with "remote".
The second condition is not the problem I am having, it is the first condition whereby the "if else statement" that I will use for the second condition needs to be based on the the client id being the same in all records that are to be used to execute this calculation.
I have searched the forums but am unable to find what I am looking for.
I am a new user so hopefully I have explained myself well enough?
Many thanks
Cam
You will have to process the dataset twice: in the first round checking if any Location = "Metro" and in the second round adapting Location if necessary. The first condition is achieved by using by-group-processing with first/last - you will find numerous documents explaining anything relevant.
This is one solution, all in one datastep using two DOW-Loops:
data want;
length FixLocation $ 15;
FixLocation = " ";
do _n_ = 1 by 1 until(last.ClientId);
set work.have;
by ClientId;
if Location = "Metro" then do;
FixLocation = "Metro";
end;
end;
do _n_ = 1 by 1 until(last.ClientId);
set work.have;
by ClientId;
if not missing(FixLocation) then do;
Location = FixLocation;
end;
output;
end;
run;
You will have to process the dataset twice: in the first round checking if any Location = "Metro" and in the second round adapting Location if necessary. The first condition is achieved by using by-group-processing with first/last - you will find numerous documents explaining anything relevant.
This is one solution, all in one datastep using two DOW-Loops:
data want;
length FixLocation $ 15;
FixLocation = " ";
do _n_ = 1 by 1 until(last.ClientId);
set work.have;
by ClientId;
if Location = "Metro" then do;
FixLocation = "Metro";
end;
end;
do _n_ = 1 by 1 until(last.ClientId);
set work.have;
by ClientId;
if not missing(FixLocation) then do;
Location = FixLocation;
end;
output;
end;
run;
Thanks very much for this help Andreas, and thanks Mkeintz for your suggestion. I have used Andreas's first code and this seems to work. I will also try the merge code both of you have supplied and if that also works for me I'll also click that these are also an accepted solution to my problem. Many thanks for your help @andreas_lds @mkeintz , it's very much appreciated!!!
Cammbo70
Hi again @andreas_lds
One issue I am having with this code is the fix location variable is only generating for the record in which the location was "Metro". All other records for the client id in question are null values.For example:
Client ID | Location | Employment Status | Year | FixLocation |
1 | Metro | Unemployed | 2001 | Metro |
1 | Remote | Employed | 2002 | no value |
1 | Metro | Employed | 2003 | Metro |
Is there a way in which you can aggregate the "Metro" outcome so that the FixLocation variable generates "Metro" for all records with that Client ID? e.g:
Client ID | Location | Employment Status | Year | FixLocation |
1 | Metro | Unemployed | 2001 | Metro |
1 | Remote | Employed | 2002 | Metro |
1 | Metro | Employed | 2003 | Metro |
Regards
Cam
p.s. @andreas_lds I have attempted a number of different options to do this (e.g. concatenating variables, else if statements) but due to my lack of experience with SAS it was unsuccessful and not worth outlining here what I have attempted.
Please post the code you have used. The code in the solution changes Location to "Metro" in the second observation.
Or you could merge the dataset with itself:
data want_merge;
merge have have(keep= ClientId Location where=(Location="Metro"));
by ClientId;
run;
You don't need an IF .. THEN to do this. A merge of the dataset with a subset of itself (subset where location='Remote') would work, as in:
data have;
input id location :$6. emp_status :$12. year;
datalines;
1 Metro Unemployed 2001
1 Remote Employed 2002
1 Metro Employed 2003
2 Metro Employed 2003
2 Metro Employed 2003
3 Remote Employed 2003
3 Remote Employed 2003
3 Remote Employed 2003
run;
data want (drop=_:);
merge have
have (rename=(location=_loc) where=(_loc='Remote') keep=id location);
by id;
location=coalescec(_loc,location);
run;
This is a merge of have with a subset of itself. If I did not apply the rename (to the 'Remote' subset), then the subset location value would not apply to the latter records within those id's that have more total records than "Remote" records. So a rename of location to _loc is used. Since _LOC is not in the complete set, it is NOT replaced by new incoming values. So any instance of _LOC in an id group will be avaliable for all records in the group.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.