Solved: Re: Obtain consistent outputs for unique records in a dataset

cammmbo70 · Posted 08-14-2020 12:24 AM

Hi,

I have a dataset which contains unique records based on a client id. For example:

Client ID	Location	Employment Status	Year
1	Metro	Unemployed	2001
1	Remote	Employed	2002
1	Metro	Employed	2003
2	Metro	Employed	2003
2	Metro	Employed	2003
3	Remote	Employed	2003
3	Remote	Employed	2003
3	Remote	Employed	2003

Now,using client id = 1 as the example, it shows a discrepancy in the outputs for Location and Employment Status. What I am trying to achieve is a consistent output for these variables based on some conditions. The first condition would be that the client id has to be consistent (i.e. so that it is only looking at records that contain that particular client id. The second condition would be simply be "if any of the outcomes for Location = Remote, then all outcomes for this particular client id would = remote (i.e. the 2 outcomes that = "metro" would be overwritten with "remote".

The second condition is not the problem I am having, it is the first condition whereby the "if else statement" that I will use for the second condition needs to be based on the the client id being the same in all records that are to be used to execute this calculation.

I have searched the forums but am unable to find what I am looking for.

I am a new user so hopefully I have explained myself well enough?

Many thanks

Cam

andreas_lds · Posted 08-14-2020 01:04 AM

You will have to process the dataset twice: in the first round checking if any Location = "Metro" and in the second round adapting Location if necessary. The first condition is achieved by using by-group-processing with first/last - you will find numerous documents explaining anything relevant.

This is one solution, all in one datastep using two DOW-Loops:

data want;
   length FixLocation $ 15;
   FixLocation = " ";
   
   do _n_ = 1 by 1 until(last.ClientId);
      set work.have;
      by ClientId;
      
      if Location = "Metro" then do;
         FixLocation = "Metro";
      end;
   end;
   
   do _n_ = 1 by 1 until(last.ClientId);
      set work.have;
      by ClientId;
      
      if not missing(FixLocation) then do;
         Location = FixLocation;
      end;
      
      output;
   end;
run;

View solution in original post

andreas_lds · Posted 08-14-2020 01:04 AM

You will have to process the dataset twice: in the first round checking if any Location = "Metro" and in the second round adapting Location if necessary. The first condition is achieved by using by-group-processing with first/last - you will find numerous documents explaining anything relevant.

This is one solution, all in one datastep using two DOW-Loops:

data want;
   length FixLocation $ 15;
   FixLocation = " ";
   
   do _n_ = 1 by 1 until(last.ClientId);
      set work.have;
      by ClientId;
      
      if Location = "Metro" then do;
         FixLocation = "Metro";
      end;
   end;
   
   do _n_ = 1 by 1 until(last.ClientId);
      set work.have;
      by ClientId;
      
      if not missing(FixLocation) then do;
         Location = FixLocation;
      end;
      
      output;
   end;
run;

cammmbo70 · Posted 08-14-2020 03:11 AM

@andreas_lds @mkeintz

Thanks very much for this help Andreas, and thanks Mkeintz for your suggestion. I have used Andreas's first code and this seems to work. I will also try the merge code both of you have supplied and if that also works for me I'll also click that these are also an accepted solution to my problem. Many thanks for your help @andreas_lds @mkeintz , it's very much appreciated!!!

Cammbo70

cammmbo70 · Posted 08-18-2020 07:03 PM

Hi again @andreas_lds

One issue I am having with this code is the fix location variable is only generating for the record in which the location was "Metro". All other records for the client id in question are null values.For example:

Client ID	Location	Employment Status	Year	FixLocation
1	Metro	Unemployed	2001	Metro
1	Remote	Employed	2002	no value
1	Metro	Employed	2003	Metro

Is there a way in which you can aggregate the "Metro" outcome so that the FixLocation variable generates "Metro" for all records with that Client ID? e.g:

Client ID	Location	Employment Status	Year	FixLocation
1	Metro	Unemployed	2001	Metro
1	Remote	Employed	2002	Metro
1	Metro	Employed	2003	Metro

Regards

Cam

p.s. @andreas_lds I have attempted a number of different options to do this (e.g. concatenating variables, else if statements) but due to my lack of experience with SAS it was unsuccessful and not worth outlining here what I have attempted.

andreas_lds · Posted 08-19-2020 08:40 AM

Please post the code you have used. The code in the solution changes Location to "Metro" in the second observation.

andreas_lds · Posted 08-14-2020 01:07 AM

Or you could merge the dataset with itself:

data want_merge;
   merge have have(keep= ClientId Location where=(Location="Metro"));
   by ClientId;
run;

mkeintz · Posted 08-14-2020 01:33 AM

You don't need an IF .. THEN to do this. A merge of the dataset with a subset of itself (subset where location='Remote') would work, as in:

data have;
  input id location :$6.  emp_status :$12.  year;
datalines;
1	Metro	Unemployed	2001
1	Remote	Employed	2002
1	Metro	Employed	2003
2	Metro	Employed	2003
2	Metro	Employed	2003
3	Remote	Employed	2003
3	Remote	Employed	2003
3	Remote	Employed	2003
run;

data want (drop=_:);
  merge have 
        have (rename=(location=_loc) where=(_loc='Remote') keep=id location);
  by id;
  location=coalescec(_loc,location);
run;

This is a merge of have with a subset of itself. If I did not apply the rename (to the 'Remote' subset), then the subset location value would not apply to the latter records within those id's that have more total records than "Remote" records. So a rename of location to _loc is used. Since _LOC is not in the complete set, it is NOT replaced by new incoming values. So any instance of _LOC in an id group will be avaliable for all records in the group.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Registration is open

SAS Training: Just a Click Away