BookmarkSubscribeRSS Feed
lturian
Calcite | Level 5

I'm trying to create a new column by grouping variables from a different column to a new output. I'm also trying to identify those variables just by their first 2 digits using a wildcard. When I run the following, the output just puts "1" in the new column for all rows, even if doesn't abide by the intended "if" statement. I'm not quite sure why it does that and fails to have the intended output in the new column.

Any help would be greatly appreciated!!

 

data test; 
	set test; 
	if HOSPSTCO=:'23' or '33' or '50' or '25' or '44' or '09' then HOSP_DIVISION=1; 
	else if HOSPSTCO=:'36' or '42' or '34' then HOSP_DIVISION=2; 
	else if HOSPSTCO=:'55' or '26' or '17' or '18' or '39' then HOSP_DIVISION=3; 
	else if HOSPSTCO=:'29' or '38' or '46' or '31' or '20' or '27' or '19' then HOSP_DIVISION=4; 
	else if HOSPSTCO=:'10' or '24' or '11' or '51' or '54' or '37' or '45' or '13' or '12' then HOSP_DIVISION=5; 
	else if HOSPSTCO=:'21' or '47' or '28' or '01' then HOSP_DIVISION=6; 
	else if HOSPSTCO=:'40' or '48' or '05' or '22' then HOSP_DIVISION=7; 
	else if HOSPSTCO=:'16' or '30' or '56' or '32' or '49' or '08' or '04' or '35' then HOSP_DIVISION=8; 
	else if HOSPSTCO=:'02' or '53' or '41' or '06' or '15' then HOSP_DIVISION=9; 
run; 
 
18 REPLIES 18
PaigeMiller
Diamond | Level 26

The proper syntax is:

 

if HOSPSTCO=:'23' or HOSPSTCO=:'33' or HOSPSTCO=:'50' or 
    HOSPSTCO=:'25' or HOSPSTCO=:'44' or HOSPSTCO=:'09' 
    then HOSP_DIVISION=1;

Alternative

if substr(hospstco,1,2) in ('23','33','50','25','44','09') then hosp_division=1;
--
Paige Miller
lturian
Calcite | Level 5
When I try this, it doesn't output any variable into HOSP_DIVISION. Every row just has "." instead of the desired output. I wonder if the problem lies with the HOSPSTCO data
PaigeMiller
Diamond | Level 26

@lturian wrote:
When I try this, it doesn't output any variable into HOSP_DIVISION. Every row just has "." instead of the desired output. I wonder if the problem lies with the HOSPSTCO data

Then LOOK AT the data in data set TEST with your own eyes.

--
Paige Miller
ballardw
Super User

The proper operator to use is IN .

 

 if substr(HOSPSTCO,1,2) in ('23' '33' '50' '25' '44' '09') then HOSP_DIVISION=1; 

Or use the SELECT When for doing lots of comparisons instead of If/then/else

 

data test2; 
	set test; 
   select (substr(HOSPSTCO,1,2));
   	When ('23' , '33' , '50' , '25' , '44' , '09' ) HOSP_DIVISION=1; 
   	When ('36' , '42' , '34' ) HOSP_DIVISION=2; 
   	When ('55' , '26' , '17' , '18' , '39' ) HOSP_DIVISION=3; 
   	When ('29' , '38' , '46' , '31' , '20' , '27' , '19' ) HOSP_DIVISION=4; 
   	When ('10' , '24' , '11' , '51' , '54' , '37' , '45' , '13' , '12' ) HOSP_DIVISION=5; 
   	When ('21' , '47' , '28' , '01' ) HOSP_DIVISION=6; 
   	When ('40' , '48' , '05' , '22' ) HOSP_DIVISION=7; 
   	When ('16' , '30' , '56' , '32' , '49' , '08' , '04' , '35' ) HOSP_DIVISION=8; 
   	When ('02' , '53' , '41' , '06' , '15' ) HOSP_DIVISION=9; 
      otherwise;
   end;
run; 

The otherwise is not required but there will be a warning if not present. It is to do what ever you might want if none of the values appear in the other When conditions. Such as perhaps put a warning to the log about an unexpected value or missing value.

 

 

Another alternative if you have a data set with the code values and categories would be to create a format to assign the division code and just use the format instead of adding another variable.

This takes advantage of an oddity in range comparisons of character values:

proc format;
value $hospdiv
'23' - '2399'= 'Division 1'
'33' - '3399'= 'Division 1'
'36' - '3699'= 'Division 2'
'42' - '4299'= 'Division 2'
;
run;

data junk;
  input x $;
  put x= $hospdiv.;
datalines;
23
231
2345
33
3356
36
367
3678
42
4244
;

I have several formats that are used to turn location codes into regions, nested region/subregion (multilabel formats), long and short descriptions.

Groups created with formats will be honored by most analysis, reporting and graphing procedures. They have a big advantage that if you need to make a small "tweek" you only need to modify Format code and do not have to go back to the data set to add yet another variable.

 

 

lturian
Calcite | Level 5
I tried both ways and for some reason I'm not getting anything in the new column. I'm just getting "." in all rows.
Reeza
Super User

Formats are your friend.

Untested but something like this is a lot cleaner and easier to maintain and test. My nickel at least.

 

proc format;
invalue  hosp_division
'23'  '33'  '50' '25'  '44' '09'  = 1
'36'  '42'  '34' = 2
'55'  '26'  '17'  '18'  '39' =3;
run;

data test2;*use a different name otherwise it's hard to track changes over time and harder to debug;
set test; 

hosp_division = input(substr(hospstco, 1, 2), hosp_division.);
run;



@lturian wrote:

I'm trying to create a new column by grouping variables from a different column to a new output. I'm also trying to identify those variables just by their first 2 digits using a wildcard. When I run the following, the output just puts "1" in the new column for all rows, even if doesn't abide by the intended "if" statement. I'm not quite sure why it does that and fails to have the intended output in the new column.

Any help would be greatly appreciated!!

 

data test; 
	set test; 
	if HOSPSTCO=:'23' or '33' or '50' or '25' or '44' or '09' then HOSP_DIVISION=1; 
	else if HOSPSTCO=:'36' or '42' or '34' then HOSP_DIVISION=2; 
	else if HOSPSTCO=:'55' or '26' or '17' or '18' or '39' then HOSP_DIVISION=3; 
	else if HOSPSTCO=:'29' or '38' or '46' or '31' or '20' or '27' or '19' then HOSP_DIVISION=4; 
	else if HOSPSTCO=:'10' or '24' or '11' or '51' or '54' or '37' or '45' or '13' or '12' then HOSP_DIVISION=5; 
	else if HOSPSTCO=:'21' or '47' or '28' or '01' then HOSP_DIVISION=6; 
	else if HOSPSTCO=:'40' or '48' or '05' or '22' then HOSP_DIVISION=7; 
	else if HOSPSTCO=:'16' or '30' or '56' or '32' or '49' or '08' or '04' or '35' then HOSP_DIVISION=8; 
	else if HOSPSTCO=:'02' or '53' or '41' or '06' or '15' then HOSP_DIVISION=9; 
run; 
 

 

PaigeMiller
Diamond | Level 26

All good replies.

 

And if in the real-world example there are cases where you sometimes want the first two characters of HOSPSTCO and other cases where you want the first three (or four or ... ) characters to match, then I think all of the solutions fail except the IF/THEN construct. But maybe I'm wrong, maybe there's a way to get the SUBSTR or FORMAT solution to work there, I just can think of it right now.

--
Paige Miller
Reeza
Super User
I think this code is the hospital modified FIPS State/County code, which is why this solution should work overall. But it definitely may not generalize to all IF/THEN questions.

https://www.hcup-us.ahrq.gov/db/vars/hospstco/kidnote.jsp

Astounding
PROC Star

@Reeza 

 

Just curious whether you tried to replace this:

 

hosp_division = input(substr(hospstco, 1, 2), hosp_division.);

Is it possible this would work?

 

hosp_division = input(hospstco, hosp_division2.);
Tom
Super User Tom
Super User

@Astounding wrote:

@Reeza 

 

Just curious whether you tried to replace this:

 

hosp_division = input(substr(hospstco, 1, 2), hosp_division.);

Is it possible this would work?

 

hosp_division = input(hospstco, hosp_division2.);

That will work.

lturian
Calcite | Level 5
I do like formats. But for some reason I'm not getting anything the formatted variables in the new column. I'm just getting "." in all rows.
ballardw
Super User

@lturian wrote:
I do like formats. But for some reason I'm not getting anything the formatted variables in the new column. I'm just getting "." in all rows.

42

 

 

Values? Format / informat definition? Code used?

lturian
Calcite | Level 5

I used the invalue format you provided: 

 

proc format;
	invalue FIPS
	'23' '33' '50' '25' '44' '09' = 1
	'36' '42' '34' = 2
	'55' '26' '17' '18' '39' = 3
	'29' '38' '46' '31' '20' '27' '19' = 4
	'10' '24' '11' '51' '54' '37' '45' '13' '12' = 5
	'21' '47' '28' '01' = 6
	'40' '48' '05' '22' = 7
	'16' '30' '56' '32' '49' '08' '04' '35' = 8
	'02' '53' '41' '06' '15' = 9
	;
run;

data test2;
	set test;
HOSP_DIVISION = input(substr(HOSPSTCO,1,2), FIPS.);
run;
Tom
Super User Tom
Super User

@lturian wrote:

I used the invalue format you provided: 

 

proc format;
	invalue FIPS
	'23' '33' '50' '25' '44' '09' = 1
	'36' '42' '34' = 2
	'55' '26' '17' '18' '39' = 3
	'29' '38' '46' '31' '20' '27' '19' = 4
	'10' '24' '11' '51' '54' '37' '45' '13' '12' = 5
	'21' '47' '28' '01' = 6
	'40' '48' '05' '22' = 7
	'16' '30' '56' '32' '49' '08' '04' '35' = 8
	'02' '53' '41' '06' '15' = 9
	;
run;

data test2;
	set test;
HOSP_DIVISION = input(substr(HOSPSTCO,1,2), FIPS.);
run;

And the result was good? bad? ugly? What?

Are you sure the variable HOSPSTCO is defined as character? 

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 18 replies
  • 2789 views
  • 9 likes
  • 8 in conversation