BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Caetreviop543
Obsidian | Level 7

This is kind of basic, but I'm having trouble creating a new variable with three different conditions, and can find very little documentation. For example:

 

data new;

set old;

 

if var1='T' and var2='T' and var3='T' then newvar='cat1';

if var1='T' and var2='F' and var3='F' then newvar='cat2';

if var1='T' and var2='T' and var3='F' then newvar='cat3';

if var1='F' and var2='F' and var3='F' then newvar='cat4';

 

run;

 

It seems straightforward, but the output will only incorrectly code one or two categories. Originally I was using if-then-else statements (which also didn't work), but I read that when coding for multiple conditions, each statement should only start with if. I've also tried putting brackets around the categories to delineate them. I'm working in linux, if that makes a difference. 

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
jhammouda
Obsidian | Level 7

Hi epstewart1110

 

Based on your data you should have gotten the following frequencies for the first 3 categories:

JA:             73043

JASN:      186270

JASNAP:  204289

 

If you add the frequencies for these 3 categories you get 463602.  Which is exactly the count you got for category JA in your first table.

 

This means that the SAS code you had has lumped all these three categories in JA.  The code truncated JASN to JA and JASNAP to JA.

 

The reason is that if you do not define the length of the new variable "inter", which apparently you did not, SAS will decide its length based on the first assignment it encounters.  The first assignment was inter="JA", so SAS decides to give the inter variable a length of 2 characters, and whatever you assign to it, only the first 2 characters will be kept. You can easily check that the length of the inter variable is 2.  The code you have is fine, just add the following statement right before the first if.

length inter $ 20; * can choose any length to accommodate your longest string for the value of this variable;

 

Good Luck

View solution in original post

13 REPLIES 13
Reeza
Super User

You're either missing cases in your logic or your data is bad somehow. 

 

data new;

set old;

 

if var1='T' and var2='T' and var3='T' then newvar='cat1';

else if var1='T' and var2='F' and var3='F' then newvar='cat2';

else if var1='T' and var2='T' and var3='F' then newvar='cat3';

else if var1='F' and var2='F' and var3='F' then newvar='cat4';

 else newvar=catt(var1, var2, var3);

run;

proc freq data=new;
table newvar;
run;

Can you run the above code and post the output from the FREQ? It will help identify where it's going wrong. You appear to be missing all cases where VAR3 = T and other values are false for example. 

 

You have 3 values and 2 options for each so you have 2*2*2 = 8 possible different combinations here but you've only accounted for 4. If your data only has 4 you're fine though.

 

TTT

TTF

TFT

FTT

FFT

FTF

TFF

FFF

 


@Caetreviop543 wrote:

This is kind of basic, but I'm having trouble creating a new variable with three different conditions, and can find very little documentation. For example:

 

data new;

set old;

 

if var1='T' and var2='T' and var3='T' then newvar='cat1';

if var1='T' and var2='F' and var3='F' then newvar='cat2';

if var1='T' and var2='T' and var3='F' then newvar='cat3';

if var1='F' and var2='F' and var3='F' then newvar='cat4';

 

run;

 

It seems straightforward, but the output will only incorrectly code one or two categories. Originally I was using if-then-else statements (which also didn't work), but I read that when coding for multiple conditions, each statement should only start with if. I've also tried putting brackets around the categories to delineate them. I'm working in linux, if that makes a difference. 

 

Thanks!


 

PaigeMiller
Diamond | Level 26

@Caetreviop543 wrote:

This is kind of basic, but I'm having trouble creating a new variable with three different conditions, and can find very little documentation. For example:

 

data new;

set old;

 

if var1='T' and var2='T' and var3='T' then newvar='cat1';

if var1='T' and var2='F' and var3='F' then newvar='cat2';

if var1='T' and var2='T' and var3='F' then newvar='cat3';

if var1='F' and var2='F' and var3='F' then newvar='cat4';

 

run;

 

It seems straightforward, but the output will only incorrectly code one or two categories. Originally I was using if-then-else statements (which also didn't work), but I read that when coding for multiple conditions, each statement should only start with if. I've also tried putting brackets around the categories to delineate them. I'm working in linux, if that makes a difference. 

 

Thanks!


You don't have to start each statement with IF. There are statements that can start with ELSE and statements that start with ELSE IF (two words). All work when used properly.

 

Now all you say is that it doesn't work, but we don't know what the inputs are and we don't know what the desired output is. Could you please share that information?

 

 

--
Paige Miller
Caetreviop543
Obsidian | Level 7

The actual variables and code are slightly different. I've attached the code and output. Interestingly the program only coded categories where all conditions were 'T'...

PaigeMiller
Diamond | Level 26

Some of us will not download Microsoft Office documents as they can be a security risk.

 

Paste your code into the window that appears when you click on the running man icon.

 

We also need to see your data, included in your reply and not via an attachment.

 

We need to see the desired output.

 

Lastly, SAS is case sensitive, is it possible that in the data the letter F is a lower case f?

--
Paige Miller
Caetreviop543
Obsidian | Level 7
data savepath.temp;
set savepath.bearinew;

if epc_ja='t' and epc_sn='f' and epc_ap='f' then inter='JA';
else if epc_ja='t' and epc_sn='t' and epc_ap='f' then inter='JASN';
else if epc_ja='t' and epc_sn='t' and epc_ap='t' then inter='JASNAP';
else if epc_sn='t' and epc_ja='f' and epc_ap='f' then inter='SN';
else if epc_sn='t' and epc_ja='f' and epc_ap='t' then inter='SNAP';
else if epc_ap='t' and epc_ja='f' and epc_sn='f' then inter='AP';
else if epc_ap='t' and epc_ja='t' and epc_sn='f' then inter='APJA';

run;

proc freq data=savepath.temp;
tables inter;
run;

proc freq data=savepath.temp;
tables epc_ja*epc_ap*epc_sn/missing;
run;

                           The SAS System                         15:01 Friday, January 10, 2020   1

                                                         The FREQ Procedure

                                                                       Cumulative    Cumulative
                                     inter    Frequency     Percent     Frequency      Percent
                                     ----------------------------------------------------------
                                     AP         324135       33.92        324135        33.92  
                                     JA         463602       48.51        787737        82.43  
                                     SN         167924       17.57        955661       100.00  

                                                     Frequency Missing = 145891
                                                           The SAS System                         15:01 Friday, January 10, 2020   2

                                                         The FREQ Procedure

                                                    Table 1 of epc_ap by epc_sn
                                                      Controlling for epc_ja=f

                                                epc_ap     epc_sn

                                                Frequency|
                                                Percent  |
                                                Row Pct  |
                                                Col Pct  |f       |t       |  Total
                                                ---------+--------+--------+
                                                f        | 145891 |  80283 | 226174
                                                         |  28.89 |  15.90 |  44.79
                                                         |  64.50 |  35.50 |
                                                         |  43.28 |  47.81 |
                                                ---------+--------+--------+
                                                t        | 191158 |  87641 | 278799
                                                         |  37.86 |  17.36 |  55.21
                                                         |  68.56 |  31.44 |
                                                         |  56.72 |  52.19 |
                                                ---------+--------+--------+
                                                Total      337049   167924   504973
                                                            66.75    33.25   100.00


                                                    Table 2 of epc_ap by epc_sn
                                                      Controlling for epc_ja=t

                                                epc_ap     epc_sn

                                                Frequency|
                                                Percent  |
                                                Row Pct  |
                                                Col Pct  |f       |t       |  Total
                                                ---------+--------+--------+
                                                f        |  73043 | 186270 | 259313
                                                         |  12.24 |  31.22 |  43.47
                                                         |  28.17 |  71.83 |
                                                         |  35.45 |  47.69 |
                                                ---------+--------+--------+
                                                t        | 132977 | 204289 | 337266
                                                         |  22.29 |  34.24 |  56.53
                                                         |  39.43 |  60.57 |
                                                         |  64.55 |  52.31 |
                                                ---------+--------+--------+
                                                Total      206020   390559   596579
                                                            34.53    65.47   100.00
1                                                          The SAS System                             





NOTE: SAS initialization used:
      real time           0.07 seconds
      cpu time            0.02 seconds
      
1          
2 
NOTE: Libref SAVEPATH was successfully assigned as follows: 
      Engine:        V9 
 
4          proc sort data=savepath.bearinew;
5          by epc_institution_name epc_ja;
6          run;
7          
8          proc freq data=savepath.bearinew;
9          tables epc_randomization_unit_id;
10         by epc_institution_name epc_ja;
11         run;
12         
13         proc freq data=savepath.bearinew;
2                                                          The SAS System                             15:01 Friday, January 10, 2020

14         tables cds_trigger_vals cds_samevisit_suppress_vals;
15         run;
16         
17         proc sort data=savepath.bearinew;
18         by epc_sn;
19         run;
20         
21         proc freq data=savepath.bearinew;
22         tables sn_denom_vals sn_samevisit_suppress_vals;
23         by epc_sn;
24         run;*/
25         
26         
27         data savepath.temp;
28         set savepath.bearinew;
29         
30         if epc_ja='t' and epc_sn='f' and epc_ap='f' then inter='JA';
31         else if epc_ja='t' and epc_sn='t' and epc_ap='f' then inter='JASN';
32         else if epc_ja='t' and epc_sn='t' and epc_ap='t' then inter='JASNAP';
33         else if epc_sn='t' and epc_ja='f' and epc_ap='f' then inter='SN';
34         else if epc_sn='t' and epc_ja='f' and epc_ap='t' then inter='SNAP';
35         else if epc_ap='t' and epc_ja='f' and epc_sn='f' then inter='AP';
36         else if epc_ap='t' and epc_ja='t' and epc_sn='f' then inter='APJA';
37         
38         run;

NOTE: There were 1101552 observations read from the data set SAVEPATH.BEARINEW.
NOTE: The data set SAVEPATH.TEMP has 1101552 observations and 171 variables.
NOTE: DATA statement used (Total process time):
      real time           6.40 seconds
      cpu time            2.72 seconds
      

39         
40         proc freq data=savepath.temp;
41         tables inter;
42         run;

NOTE: There were 1101552 observations read from the data set SAVEPATH.TEMP.
NOTE: The PROCEDURE FREQ printed page 1.
NOTE: PROCEDURE FREQ used (Total process time):
      real time           2.13 seconds
      cpu time            0.91 seconds
      

43         
44         proc freq data=savepath.temp;
45         tables epc_ja*epc_ap*epc_sn/missing;
46         run;

NOTE: There were 1101552 observations read from the data set SAVEPATH.TEMP.
NOTE: The PROCEDURE FREQ printed page 2.
NOTE: PROCEDURE FREQ used (Total process time):
      real time           1.91 seconds
      cpu time            0.99 seconds
      

47         
3                                                          The SAS System                             15:01 Friday, January 10, 2020


NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414
NOTE: The SAS System used:
      real time           10.54 seconds
      cpu time            4.65 seconds
      

Ok, I've posted the code, output, and log, but I can't give access to the data 😕 It's a protected study in a secure server. 

Reeza
Super User

Your issues is partly length. I would suggest doing the following and checking your results.

FYI - this problem would NOT occur with your initial 'test' case. 

 

data savepath.temp;
set savepath.bearinew;

length inter $10.;

if epc_ja='t' and epc_sn='f' and epc_ap='f' then inter='JA';
else if epc_ja='t' and epc_sn='t' and epc_ap='f' then inter='JASN';
else if epc_ja='t' and epc_sn='t' and epc_ap='t' then inter='JASNAP';
else if epc_sn='t' and epc_ja='f' and epc_ap='f' then inter='SN';
else if epc_sn='t' and epc_ja='f' and epc_ap='t' then inter='SNAP';
else if epc_ap='t' and epc_ja='f' and epc_sn='f' then inter='AP';
else if epc_ap='t' and epc_ja='t' and epc_sn='f' then inter='APJA';
else inter= catt(epc_ja, epc_sn, epc, ap);

run;

proc freq data=savepath.temp;
tables inter;
run;

@Caetreviop543 wrote:

data savepath.temp;
set savepath.bearinew;

if epc_ja='t' and epc_sn='f' and epc_ap='f' then inter='JA';
else if epc_ja='t' and epc_sn='t' and epc_ap='f' then inter='JASN';
else if epc_ja='t' and epc_sn='t' and epc_ap='t' then inter='JASNAP';
else if epc_sn='t' and epc_ja='f' and epc_ap='f' then inter='SN';
else if epc_sn='t' and epc_ja='f' and epc_ap='t' then inter='SNAP';
else if epc_ap='t' and epc_ja='f' and epc_sn='f' then inter='AP';
else if epc_ap='t' and epc_ja='t' and epc_sn='f' then inter='APJA';

run;

proc freq data=savepath.temp;
tables inter;
run;

proc freq data=savepath.temp;
tables epc_ja*epc_ap*epc_sn/missing;
run;

                           The SAS System                         15:01 Friday, January 10, 2020   1

                                                         The FREQ Procedure

                                                                       Cumulative    Cumulative
                                     inter    Frequency     Percent     Frequency      Percent
                                     ----------------------------------------------------------
                                     AP         324135       33.92        324135        33.92  
                                     JA         463602       48.51        787737        82.43  
                                     SN         167924       17.57        955661       100.00  

                                                     Frequency Missing = 145891
                                                           The SAS System                         15:01 Friday, January 10, 2020   2

                                                         The FREQ Procedure

                                                    Table 1 of epc_ap by epc_sn
                                                      Controlling for epc_ja=f

                                                epc_ap     epc_sn

                                                Frequency|
                                                Percent  |
                                                Row Pct  |
                                                Col Pct  |f       |t       |  Total
                                                ---------+--------+--------+
                                                f        | 145891 |  80283 | 226174
                                                         |  28.89 |  15.90 |  44.79
                                                         |  64.50 |  35.50 |
                                                         |  43.28 |  47.81 |
                                                ---------+--------+--------+
                                                t        | 191158 |  87641 | 278799
                                                         |  37.86 |  17.36 |  55.21
                                                         |  68.56 |  31.44 |
                                                         |  56.72 |  52.19 |
                                                ---------+--------+--------+
                                                Total      337049   167924   504973
                                                            66.75    33.25   100.00


                                                    Table 2 of epc_ap by epc_sn
                                                      Controlling for epc_ja=t

                                                epc_ap     epc_sn

                                                Frequency|
                                                Percent  |
                                                Row Pct  |
                                                Col Pct  |f       |t       |  Total
                                                ---------+--------+--------+
                                                f        |  73043 | 186270 | 259313
                                                         |  12.24 |  31.22 |  43.47
                                                         |  28.17 |  71.83 |
                                                         |  35.45 |  47.69 |
                                                ---------+--------+--------+
                                                t        | 132977 | 204289 | 337266
                                                         |  22.29 |  34.24 |  56.53
                                                         |  39.43 |  60.57 |
                                                         |  64.55 |  52.31 |
                                                ---------+--------+--------+
                                                Total      206020   390559   596579
                                                            34.53    65.47   100.00


PROC MIGRATE will preserve current SAS file attributes and is 
recommended for converting all your SAS libraries from any 
SAS 8 release to SAS 9.  For details and examples, please see
http://support.sas.com/rnd/migration/index.html


This message is contained in the SAS news file, and is presented upon
initialization.  Edit the file "news" in the "misc/base" directory to
display site-specific news and information in the program log.
The command line option "-nonews" will prevent this display.




NOTE: SAS initialization used:
      real time           0.07 seconds
      cpu time            0.02 seconds
      
1          
2          libname savepath "PATH";
NOTE: Libref SAVEPATH was successfully assigned as follows: 
      Engine:        V9 
      Physical Name: 
3          /*
4          proc sort data=savepath.bearinew;
5          by epc_institution_name epc_ja;
6          run;
7          
8          proc freq data=savepath.bearinew;
9          tables epc_randomization_unit_id;
10         by epc_institution_name epc_ja;
11         run;
12         
13         proc freq data=savepath.bearinew;
2                                                          The SAS System                             15:01 Friday, January 10, 2020

14         tables cds_trigger_vals cds_samevisit_suppress_vals;
15         run;
16         
17         proc sort data=savepath.bearinew;
18         by epc_sn;
19         run;
20         
21         proc freq data=savepath.bearinew;
22         tables sn_denom_vals sn_samevisit_suppress_vals;
23         by epc_sn;
24         run;*/
25         
26         
27         data savepath.temp;
28         set savepath.bearinew;
29         
30         if epc_ja='t' and epc_sn='f' and epc_ap='f' then inter='JA';
31         else if epc_ja='t' and epc_sn='t' and epc_ap='f' then inter='JASN';
32         else if epc_ja='t' and epc_sn='t' and epc_ap='t' then inter='JASNAP';
33         else if epc_sn='t' and epc_ja='f' and epc_ap='f' then inter='SN';
34         else if epc_sn='t' and epc_ja='f' and epc_ap='t' then inter='SNAP';
35         else if epc_ap='t' and epc_ja='f' and epc_sn='f' then inter='AP';
36         else if epc_ap='t' and epc_ja='t' and epc_sn='f' then inter='APJA';
37         
38         run;

NOTE: There were 1101552 observations read from the data set SAVEPATH.BEARINEW.
NOTE: The data set SAVEPATH.TEMP has 1101552 observations and 171 variables.
NOTE: DATA statement used (Total process time):
      real time           6.40 seconds
      cpu time            2.72 seconds
      

39         
40         proc freq data=savepath.temp;
41         tables inter;
42         run;

NOTE: There were 1101552 observations read from the data set SAVEPATH.TEMP.
NOTE: The PROCEDURE FREQ printed page 1.
NOTE: PROCEDURE FREQ used (Total process time):
      real time           2.13 seconds
      cpu time            0.91 seconds
      

43         
44         proc freq data=savepath.temp;
45         tables epc_ja*epc_ap*epc_sn/missing;
46         run;

NOTE: There were 1101552 observations read from the data set SAVEPATH.TEMP.
NOTE: The PROCEDURE FREQ printed page 2.
NOTE: PROCEDURE FREQ used (Total process time):
      real time           1.91 seconds
      cpu time            0.99 seconds
      

47         
3                                                          The SAS System                             15:01 Friday, January 10, 2020


NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414
NOTE: The SAS System used:
      real time           10.54 seconds
      cpu time            4.65 seconds
      

Ok, I've posted the code, output, and log, but I can't give access to the data 😕 It's a protected study in a secure server. 


 

Reeza
Super User
You may want to delete the first few lines of your log that includes your SITE number as well as institution.
Caetreviop543
Obsidian | Level 7

Thanks for letting me know.

Caetreviop543
Obsidian | Level 7

Is it also possible for you to delete your post with the old log?

jhammouda
Obsidian | Level 7

Hi epstewart1110

 

Based on your data you should have gotten the following frequencies for the first 3 categories:

JA:             73043

JASN:      186270

JASNAP:  204289

 

If you add the frequencies for these 3 categories you get 463602.  Which is exactly the count you got for category JA in your first table.

 

This means that the SAS code you had has lumped all these three categories in JA.  The code truncated JASN to JA and JASNAP to JA.

 

The reason is that if you do not define the length of the new variable "inter", which apparently you did not, SAS will decide its length based on the first assignment it encounters.  The first assignment was inter="JA", so SAS decides to give the inter variable a length of 2 characters, and whatever you assign to it, only the first 2 characters will be kept. You can easily check that the length of the inter variable is 2.  The code you have is fine, just add the following statement right before the first if.

length inter $ 20; * can choose any length to accommodate your longest string for the value of this variable;

 

Good Luck

Caetreviop543
Obsidian | Level 7

Thanks, that worked!

unison
Lapis Lazuli | Level 10

This logic works as expected (see data new step):

data old;
	input (var1-var3) ($) @@;
	datalines;
T T T T T F T F F
T F T F T T F T F
F F T F F F
;
run;

data expected;
	input (var1-var3) ($) newvar :$4. @@;
	datalines;
T T T cat1 T T F cat3 T F F cat2
T F T cat? F T T cat? F T F cat?
F F T cat? F F F cat4
;
run;

data new;
	set old;

	if var1='T' and var2='T' and var3='T' then
		newvar='cat1';
	else if var1='T' and var2='F' and var3='F' then
		newvar='cat2';
	else if var1='T' and var2='T' and var3='F' then
		newvar='cat3';
	else if var1='F' and var2='F' and var3='F' then
		newvar='cat4';
	else newvar='cat?';
run;

proc print data=old;
proc print data=expected;
proc print data=new;
run;

proc compare base=expected compare=new;
run;
-unison
Reeza
Super User

@unison unfortunately this is a case where the small test case will not generalize to the actual use case because the length of the new variable is different. And in SAS new variables default to 8, or the length when it's first encountered, which is likely too short. So even if it is getting coding correctly you won't see the full length of the variable. 

 

@Caetreviop543 I removed the relevant sections. Does the solution proposed work though? If it doesn't the issue lies with your data then, not the code.

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 30567 views
  • 1 like
  • 5 in conversation