BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Etoo12121
Obsidian | Level 7

Hi all,

 

I have the below dataset and trying to create a new variable  (Location2).

 

looking at every row for each person,
if there is only one row of data per person, then location2 = location
if there are multiple rows of data per person and location is consistent for each person, then location2 = location
if there are multiple rows of data per person but location is not consistent, then location2 = 'Both'

 

I've tried using a do-loop but it isn't quite working as I want.

 

PersondateLocationLocation2
100120190629DCDC
100720190124MarylandBoth
100720191105DCBoth
236420191203MarylandMD
236420190405MarylandMD
333120190101MarylandMD
657920190325DCDC
657920190404DCDC
657920190416DCDC
657920190504DCDC
657920190717DCDC
657920190807DCDC
657920191101DCDC
103420190219MarylandBoth
103420191019DCBoth
103520190301MarylandBoth
103520191201DCBoth
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You did not provide any rule for converting Maryland into MD.

proc sql;
  create table want as
    select *
          , case when (count(distinct location)<2) then location else 'Both' end as location2
    from have
    group by person
  ;
quit;

Results:

Obs    Person          date    Location    expected    location2

  1     1001     2019-06-29    DC            DC        DC
  2     1007     2019-11-05    DC            Both      Both
  3     1007     2019-01-24    Maryland      Both      Both
  4     1034     2019-02-19    Maryland      Both      Both
  5     1034     2019-10-19    DC            Both      Both
  6     1035     2019-12-01    DC            Both      Both
  7     1035     2019-03-01    Maryland      Both      Both
  8     2364     2019-04-05    Maryland      MD        Maryland
  9     2364     2019-12-03    Maryland      MD        Maryland
 10     3331     2019-01-01    Maryland      MD        Maryland
 11     6579     2019-03-25    DC            DC        DC
 12     6579     2019-11-01    DC            DC        DC
 13     6579     2019-08-07    DC            DC        DC
 14     6579     2019-07-17    DC            DC        DC
 15     6579     2019-05-04    DC            DC        DC
 16     6579     2019-04-16    DC            DC        DC
 17     6579     2019-04-04    DC            DC        DC

View solution in original post

5 REPLIES 5
tarheel13
Rhodochrosite | Level 12

Can you please post your data in a data step? 

Tom
Super User Tom
Super User

You did not provide any rule for converting Maryland into MD.

proc sql;
  create table want as
    select *
          , case when (count(distinct location)<2) then location else 'Both' end as location2
    from have
    group by person
  ;
quit;

Results:

Obs    Person          date    Location    expected    location2

  1     1001     2019-06-29    DC            DC        DC
  2     1007     2019-11-05    DC            Both      Both
  3     1007     2019-01-24    Maryland      Both      Both
  4     1034     2019-02-19    Maryland      Both      Both
  5     1034     2019-10-19    DC            Both      Both
  6     1035     2019-12-01    DC            Both      Both
  7     1035     2019-03-01    Maryland      Both      Both
  8     2364     2019-04-05    Maryland      MD        Maryland
  9     2364     2019-12-03    Maryland      MD        Maryland
 10     3331     2019-01-01    Maryland      MD        Maryland
 11     6579     2019-03-25    DC            DC        DC
 12     6579     2019-11-01    DC            DC        DC
 13     6579     2019-08-07    DC            DC        DC
 14     6579     2019-07-17    DC            DC        DC
 15     6579     2019-05-04    DC            DC        DC
 16     6579     2019-04-16    DC            DC        DC
 17     6579     2019-04-04    DC            DC        DC
Etoo12121
Obsidian | Level 7
This worked perfectly
tarheel13
Rhodochrosite | Level 12

Here's the more convoluted way by storing min and max variables of each by group in new columns called min_loc and max_loc. When min_loc=max_loc, then you can set location2=location. When they're not equal, you can set it to "Both".

data have;
input person $4. location $20.;
datalines;
1001 DC	
1007 Maryland	
1007 DC	
2364 Maryland	
2364 Maryland	
3331 Maryland	
6579 DC	
6579 DC	
6579 DC	
6579 DC	
6579 DC	
6579 DC	
6579 DC	
1034 Maryland	
1034 DC	
1035 Maryland	
1035 DC
;
run;

data min;
	set have;
	by person;
	length min_loc  $20;
	retain min_loc ;
	if first.person then do;
		min_loc='';
	end;
	if first.person then min_loc=location;
run;

proc sort data=min;
	by person descending location;
run;

data max;
	set min;
	by person;
	length max_loc location2 $20;
	retain max_loc;
	if first.person then do;
		max_loc='';
	end;
	if first.person then max_loc=location;
	
	if min_loc ^= max_loc then location2="Both";
	if min_loc=max_loc then location2=location;
run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1154 views
  • 2 likes
  • 3 in conversation