BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cammmbo70
Fluorite | Level 6

Hi,

I have a dataset which contains unique records based on a client id. For example:

Client IDLocationEmployment StatusYear
1MetroUnemployed2001
1RemoteEmployed2002
1MetroEmployed2003
2MetroEmployed2003
2MetroEmployed2003
3RemoteEmployed2003
3RemoteEmployed2003
3RemoteEmployed2003

 Now,using client id = 1 as the example, it shows a discrepancy in the outputs for Location and Employment Status. What I am trying to achieve is a consistent output for these variables based on some conditions. The first condition would be that the client id has to be consistent (i.e. so that it is only looking at records that contain that particular client id. The second condition would be simply be "if any of the outcomes for Location = Remote, then all outcomes for this particular client id would = remote (i.e. the 2 outcomes that = "metro" would be overwritten with "remote".

 

The second condition is not the problem I am having, it is the first condition whereby the "if else statement" that I will use for the second condition needs to be based on the the client id being the same in all records that are to be used to execute this calculation. 

 

I have searched the forums but am unable to find what I am looking for.

I am a new user so hopefully I have explained myself well enough?

 

Many thanks

Cam

1 ACCEPTED SOLUTION

Accepted Solutions
andreas_lds
Jade | Level 19

You will have to process the dataset twice: in the first round checking if any Location = "Metro" and in the second round adapting Location if necessary. The first condition is achieved by using by-group-processing with first/last - you will find numerous documents explaining anything relevant.

 

This is one solution, all in one datastep using two DOW-Loops:

data want;
   length FixLocation $ 15;
   FixLocation = " ";
   
   do _n_ = 1 by 1 until(last.ClientId);
      set work.have;
      by ClientId;
      
      if Location = "Metro" then do;
         FixLocation = "Metro";
      end;
   end;
   
   do _n_ = 1 by 1 until(last.ClientId);
      set work.have;
      by ClientId;
      
      if not missing(FixLocation) then do;
         Location = FixLocation;
      end;
      
      output;
   end;
run;

View solution in original post

6 REPLIES 6
andreas_lds
Jade | Level 19

You will have to process the dataset twice: in the first round checking if any Location = "Metro" and in the second round adapting Location if necessary. The first condition is achieved by using by-group-processing with first/last - you will find numerous documents explaining anything relevant.

 

This is one solution, all in one datastep using two DOW-Loops:

data want;
   length FixLocation $ 15;
   FixLocation = " ";
   
   do _n_ = 1 by 1 until(last.ClientId);
      set work.have;
      by ClientId;
      
      if Location = "Metro" then do;
         FixLocation = "Metro";
      end;
   end;
   
   do _n_ = 1 by 1 until(last.ClientId);
      set work.have;
      by ClientId;
      
      if not missing(FixLocation) then do;
         Location = FixLocation;
      end;
      
      output;
   end;
run;
cammmbo70
Fluorite | Level 6

@andreas_lds @mkeintz 

 

Thanks very much for this help Andreas, and thanks Mkeintz for your suggestion. I have used Andreas's first code and this seems to work. I will also try the merge code both of you have supplied and if that also works for me I'll also click that these are also an accepted solution to my problem. Many thanks for your help @andreas_lds @mkeintz , it's very much appreciated!!!

Cammbo70

cammmbo70
Fluorite | Level 6

Hi again @andreas_lds

One issue I am having with this code is the fix location variable is only generating for the record in which the location was "Metro".  All other records for the client id in question are null values.For example:

 

Client IDLocationEmployment StatusYearFixLocation
1MetroUnemployed2001Metro
1RemoteEmployed2002no value
1MetroEmployed2003Metro

 

Is there a way in which you can aggregate the "Metro" outcome so that the FixLocation variable generates "Metro" for all records with that Client ID? e.g:

 

Client IDLocationEmployment StatusYearFixLocation
1MetroUnemployed2001Metro
1RemoteEmployed2002Metro
1MetroEmployed2003Metro

 

Regards

Cam

p.s. @andreas_lds I have attempted a number of different options to do this (e.g. concatenating variables, else if statements) but due to my lack of experience with SAS it was unsuccessful and not worth outlining here what I have attempted.

andreas_lds
Jade | Level 19

Please post the code you have used. The code in the solution changes Location to "Metro" in the second observation.

andreas_lds
Jade | Level 19

Or you could merge the dataset with itself:

data want_merge;
   merge have have(keep= ClientId Location where=(Location="Metro"));
   by ClientId;
run;
mkeintz
PROC Star

You don't need an IF .. THEN to do this.  A merge of the dataset with a subset of itself (subset where location='Remote') would work, as in:

 

data have;
  input id location :$6.  emp_status :$12.  year;
datalines;
1	Metro	Unemployed	2001
1	Remote	Employed	2002
1	Metro	Employed	2003
2	Metro	Employed	2003
2	Metro	Employed	2003
3	Remote	Employed	2003
3	Remote	Employed	2003
3	Remote	Employed	2003
run;

data want (drop=_:);
  merge have 
        have (rename=(location=_loc) where=(_loc='Remote') keep=id location);
  by id;
  location=coalescec(_loc,location);
run;

This is a merge of have with a subset of itself.  If I did not apply the rename (to the 'Remote' subset), then the subset location value would not apply to the latter records within those id's that have more total records than "Remote" records.  So a rename of location to _loc is used.  Since _LOC is not in the complete set, it is NOT replaced by new incoming values.  So any instance of _LOC in an id group will be avaliable for all records in the group.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 713 views
  • 3 likes
  • 3 in conversation