- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is kind of basic, but I'm having trouble creating a new variable with three different conditions, and can find very little documentation. For example:
data new;
set old;
if var1='T' and var2='T' and var3='T' then newvar='cat1';
if var1='T' and var2='F' and var3='F' then newvar='cat2';
if var1='T' and var2='T' and var3='F' then newvar='cat3';
if var1='F' and var2='F' and var3='F' then newvar='cat4';
run;
It seems straightforward, but the output will only incorrectly code one or two categories. Originally I was using if-then-else statements (which also didn't work), but I read that when coding for multiple conditions, each statement should only start with if. I've also tried putting brackets around the categories to delineate them. I'm working in linux, if that makes a difference.
Thanks!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Based on your data you should have gotten the following frequencies for the first 3 categories:
JA: 73043
JASN: 186270
JASNAP: 204289
If you add the frequencies for these 3 categories you get 463602. Which is exactly the count you got for category JA in your first table.
This means that the SAS code you had has lumped all these three categories in JA. The code truncated JASN to JA and JASNAP to JA.
The reason is that if you do not define the length of the new variable "inter", which apparently you did not, SAS will decide its length based on the first assignment it encounters. The first assignment was inter="JA", so SAS decides to give the inter variable a length of 2 characters, and whatever you assign to it, only the first 2 characters will be kept. You can easily check that the length of the inter variable is 2. The code you have is fine, just add the following statement right before the first if.
length inter $ 20; * can choose any length to accommodate your longest string for the value of this variable;
Good Luck
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You're either missing cases in your logic or your data is bad somehow.
data new;
set old;
if var1='T' and var2='T' and var3='T' then newvar='cat1';
else if var1='T' and var2='F' and var3='F' then newvar='cat2';
else if var1='T' and var2='T' and var3='F' then newvar='cat3';
else if var1='F' and var2='F' and var3='F' then newvar='cat4';
else newvar=catt(var1, var2, var3);
run;
proc freq data=new;
table newvar;
run;
Can you run the above code and post the output from the FREQ? It will help identify where it's going wrong. You appear to be missing all cases where VAR3 = T and other values are false for example.
You have 3 values and 2 options for each so you have 2*2*2 = 8 possible different combinations here but you've only accounted for 4. If your data only has 4 you're fine though.
TTT
TTF
TFT
FTT
FFT
FTF
TFF
FFF
@Caetreviop543 wrote:
This is kind of basic, but I'm having trouble creating a new variable with three different conditions, and can find very little documentation. For example:
data new;
set old;
if var1='T' and var2='T' and var3='T' then newvar='cat1';
if var1='T' and var2='F' and var3='F' then newvar='cat2';
if var1='T' and var2='T' and var3='F' then newvar='cat3';
if var1='F' and var2='F' and var3='F' then newvar='cat4';
run;
It seems straightforward, but the output will only incorrectly code one or two categories. Originally I was using if-then-else statements (which also didn't work), but I read that when coding for multiple conditions, each statement should only start with if. I've also tried putting brackets around the categories to delineate them. I'm working in linux, if that makes a difference.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Caetreviop543 wrote:
This is kind of basic, but I'm having trouble creating a new variable with three different conditions, and can find very little documentation. For example:
data new;
set old;
if var1='T' and var2='T' and var3='T' then newvar='cat1';
if var1='T' and var2='F' and var3='F' then newvar='cat2';
if var1='T' and var2='T' and var3='F' then newvar='cat3';
if var1='F' and var2='F' and var3='F' then newvar='cat4';
run;
It seems straightforward, but the output will only incorrectly code one or two categories. Originally I was using if-then-else statements (which also didn't work), but I read that when coding for multiple conditions, each statement should only start with if. I've also tried putting brackets around the categories to delineate them. I'm working in linux, if that makes a difference.
Thanks!
You don't have to start each statement with IF. There are statements that can start with ELSE and statements that start with ELSE IF (two words). All work when used properly.
Now all you say is that it doesn't work, but we don't know what the inputs are and we don't know what the desired output is. Could you please share that information?
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The actual variables and code are slightly different. I've attached the code and output. Interestingly the program only coded categories where all conditions were 'T'...
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Some of us will not download Microsoft Office documents as they can be a security risk.
Paste your code into the window that appears when you click on the running man icon.
We also need to see your data, included in your reply and not via an attachment.
We need to see the desired output.
Lastly, SAS is case sensitive, is it possible that in the data the letter F is a lower case f?
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data savepath.temp;
set savepath.bearinew;
if epc_ja='t' and epc_sn='f' and epc_ap='f' then inter='JA';
else if epc_ja='t' and epc_sn='t' and epc_ap='f' then inter='JASN';
else if epc_ja='t' and epc_sn='t' and epc_ap='t' then inter='JASNAP';
else if epc_sn='t' and epc_ja='f' and epc_ap='f' then inter='SN';
else if epc_sn='t' and epc_ja='f' and epc_ap='t' then inter='SNAP';
else if epc_ap='t' and epc_ja='f' and epc_sn='f' then inter='AP';
else if epc_ap='t' and epc_ja='t' and epc_sn='f' then inter='APJA';
run;
proc freq data=savepath.temp;
tables inter;
run;
proc freq data=savepath.temp;
tables epc_ja*epc_ap*epc_sn/missing;
run;
The SAS System 15:01 Friday, January 10, 2020 1
The FREQ Procedure
Cumulative Cumulative
inter Frequency Percent Frequency Percent
----------------------------------------------------------
AP 324135 33.92 324135 33.92
JA 463602 48.51 787737 82.43
SN 167924 17.57 955661 100.00
Frequency Missing = 145891
The SAS System 15:01 Friday, January 10, 2020 2
The FREQ Procedure
Table 1 of epc_ap by epc_sn
Controlling for epc_ja=f
epc_ap epc_sn
Frequency|
Percent |
Row Pct |
Col Pct |f |t | Total
---------+--------+--------+
f | 145891 | 80283 | 226174
| 28.89 | 15.90 | 44.79
| 64.50 | 35.50 |
| 43.28 | 47.81 |
---------+--------+--------+
t | 191158 | 87641 | 278799
| 37.86 | 17.36 | 55.21
| 68.56 | 31.44 |
| 56.72 | 52.19 |
---------+--------+--------+
Total 337049 167924 504973
66.75 33.25 100.00
Table 2 of epc_ap by epc_sn
Controlling for epc_ja=t
epc_ap epc_sn
Frequency|
Percent |
Row Pct |
Col Pct |f |t | Total
---------+--------+--------+
f | 73043 | 186270 | 259313
| 12.24 | 31.22 | 43.47
| 28.17 | 71.83 |
| 35.45 | 47.69 |
---------+--------+--------+
t | 132977 | 204289 | 337266
| 22.29 | 34.24 | 56.53
| 39.43 | 60.57 |
| 64.55 | 52.31 |
---------+--------+--------+
Total 206020 390559 596579
34.53 65.47 100.00
1 The SAS System
NOTE: SAS initialization used:
real time 0.07 seconds
cpu time 0.02 seconds
1
2
NOTE: Libref SAVEPATH was successfully assigned as follows:
Engine: V9
4 proc sort data=savepath.bearinew;
5 by epc_institution_name epc_ja;
6 run;
7
8 proc freq data=savepath.bearinew;
9 tables epc_randomization_unit_id;
10 by epc_institution_name epc_ja;
11 run;
12
13 proc freq data=savepath.bearinew;
2 The SAS System 15:01 Friday, January 10, 2020
14 tables cds_trigger_vals cds_samevisit_suppress_vals;
15 run;
16
17 proc sort data=savepath.bearinew;
18 by epc_sn;
19 run;
20
21 proc freq data=savepath.bearinew;
22 tables sn_denom_vals sn_samevisit_suppress_vals;
23 by epc_sn;
24 run;*/
25
26
27 data savepath.temp;
28 set savepath.bearinew;
29
30 if epc_ja='t' and epc_sn='f' and epc_ap='f' then inter='JA';
31 else if epc_ja='t' and epc_sn='t' and epc_ap='f' then inter='JASN';
32 else if epc_ja='t' and epc_sn='t' and epc_ap='t' then inter='JASNAP';
33 else if epc_sn='t' and epc_ja='f' and epc_ap='f' then inter='SN';
34 else if epc_sn='t' and epc_ja='f' and epc_ap='t' then inter='SNAP';
35 else if epc_ap='t' and epc_ja='f' and epc_sn='f' then inter='AP';
36 else if epc_ap='t' and epc_ja='t' and epc_sn='f' then inter='APJA';
37
38 run;
NOTE: There were 1101552 observations read from the data set SAVEPATH.BEARINEW.
NOTE: The data set SAVEPATH.TEMP has 1101552 observations and 171 variables.
NOTE: DATA statement used (Total process time):
real time 6.40 seconds
cpu time 2.72 seconds
39
40 proc freq data=savepath.temp;
41 tables inter;
42 run;
NOTE: There were 1101552 observations read from the data set SAVEPATH.TEMP.
NOTE: The PROCEDURE FREQ printed page 1.
NOTE: PROCEDURE FREQ used (Total process time):
real time 2.13 seconds
cpu time 0.91 seconds
43
44 proc freq data=savepath.temp;
45 tables epc_ja*epc_ap*epc_sn/missing;
46 run;
NOTE: There were 1101552 observations read from the data set SAVEPATH.TEMP.
NOTE: The PROCEDURE FREQ printed page 2.
NOTE: PROCEDURE FREQ used (Total process time):
real time 1.91 seconds
cpu time 0.99 seconds
47
3 The SAS System 15:01 Friday, January 10, 2020
NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414
NOTE: The SAS System used:
real time 10.54 seconds
cpu time 4.65 seconds
Ok, I've posted the code, output, and log, but I can't give access to the data 😕 It's a protected study in a secure server.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your issues is partly length. I would suggest doing the following and checking your results.
FYI - this problem would NOT occur with your initial 'test' case.
data savepath.temp;
set savepath.bearinew;
length inter $10.;
if epc_ja='t' and epc_sn='f' and epc_ap='f' then inter='JA';
else if epc_ja='t' and epc_sn='t' and epc_ap='f' then inter='JASN';
else if epc_ja='t' and epc_sn='t' and epc_ap='t' then inter='JASNAP';
else if epc_sn='t' and epc_ja='f' and epc_ap='f' then inter='SN';
else if epc_sn='t' and epc_ja='f' and epc_ap='t' then inter='SNAP';
else if epc_ap='t' and epc_ja='f' and epc_sn='f' then inter='AP';
else if epc_ap='t' and epc_ja='t' and epc_sn='f' then inter='APJA';
else inter= catt(epc_ja, epc_sn, epc, ap);
run;
proc freq data=savepath.temp;
tables inter;
run;
@Caetreviop543 wrote:
data savepath.temp; set savepath.bearinew; if epc_ja='t' and epc_sn='f' and epc_ap='f' then inter='JA'; else if epc_ja='t' and epc_sn='t' and epc_ap='f' then inter='JASN'; else if epc_ja='t' and epc_sn='t' and epc_ap='t' then inter='JASNAP'; else if epc_sn='t' and epc_ja='f' and epc_ap='f' then inter='SN'; else if epc_sn='t' and epc_ja='f' and epc_ap='t' then inter='SNAP'; else if epc_ap='t' and epc_ja='f' and epc_sn='f' then inter='AP'; else if epc_ap='t' and epc_ja='t' and epc_sn='f' then inter='APJA'; run; proc freq data=savepath.temp; tables inter; run; proc freq data=savepath.temp; tables epc_ja*epc_ap*epc_sn/missing; run;
The SAS System 15:01 Friday, January 10, 2020 1 The FREQ Procedure Cumulative Cumulative inter Frequency Percent Frequency Percent ---------------------------------------------------------- AP 324135 33.92 324135 33.92 JA 463602 48.51 787737 82.43 SN 167924 17.57 955661 100.00 Frequency Missing = 145891 The SAS System 15:01 Friday, January 10, 2020 2 The FREQ Procedure Table 1 of epc_ap by epc_sn Controlling for epc_ja=f epc_ap epc_sn Frequency| Percent | Row Pct | Col Pct |f |t | Total ---------+--------+--------+ f | 145891 | 80283 | 226174 | 28.89 | 15.90 | 44.79 | 64.50 | 35.50 | | 43.28 | 47.81 | ---------+--------+--------+ t | 191158 | 87641 | 278799 | 37.86 | 17.36 | 55.21 | 68.56 | 31.44 | | 56.72 | 52.19 | ---------+--------+--------+ Total 337049 167924 504973 66.75 33.25 100.00 Table 2 of epc_ap by epc_sn Controlling for epc_ja=t epc_ap epc_sn Frequency| Percent | Row Pct | Col Pct |f |t | Total ---------+--------+--------+ f | 73043 | 186270 | 259313 | 12.24 | 31.22 | 43.47 | 28.17 | 71.83 | | 35.45 | 47.69 | ---------+--------+--------+ t | 132977 | 204289 | 337266 | 22.29 | 34.24 | 56.53 | 39.43 | 60.57 | | 64.55 | 52.31 | ---------+--------+--------+ Total 206020 390559 596579 34.53 65.47 100.00
PROC MIGRATE will preserve current SAS file attributes and is recommended for converting all your SAS libraries from any SAS 8 release to SAS 9. For details and examples, please see http://support.sas.com/rnd/migration/index.html This message is contained in the SAS news file, and is presented upon initialization. Edit the file "news" in the "misc/base" directory to display site-specific news and information in the program log. The command line option "-nonews" will prevent this display. NOTE: SAS initialization used: real time 0.07 seconds cpu time 0.02 seconds 1 2 libname savepath "PATH"; NOTE: Libref SAVEPATH was successfully assigned as follows: Engine: V9 Physical Name: 3 /* 4 proc sort data=savepath.bearinew; 5 by epc_institution_name epc_ja; 6 run; 7 8 proc freq data=savepath.bearinew; 9 tables epc_randomization_unit_id; 10 by epc_institution_name epc_ja; 11 run; 12 13 proc freq data=savepath.bearinew; 2 The SAS System 15:01 Friday, January 10, 2020 14 tables cds_trigger_vals cds_samevisit_suppress_vals; 15 run; 16 17 proc sort data=savepath.bearinew; 18 by epc_sn; 19 run; 20 21 proc freq data=savepath.bearinew; 22 tables sn_denom_vals sn_samevisit_suppress_vals; 23 by epc_sn; 24 run;*/ 25 26 27 data savepath.temp; 28 set savepath.bearinew; 29 30 if epc_ja='t' and epc_sn='f' and epc_ap='f' then inter='JA'; 31 else if epc_ja='t' and epc_sn='t' and epc_ap='f' then inter='JASN'; 32 else if epc_ja='t' and epc_sn='t' and epc_ap='t' then inter='JASNAP'; 33 else if epc_sn='t' and epc_ja='f' and epc_ap='f' then inter='SN'; 34 else if epc_sn='t' and epc_ja='f' and epc_ap='t' then inter='SNAP'; 35 else if epc_ap='t' and epc_ja='f' and epc_sn='f' then inter='AP'; 36 else if epc_ap='t' and epc_ja='t' and epc_sn='f' then inter='APJA'; 37 38 run; NOTE: There were 1101552 observations read from the data set SAVEPATH.BEARINEW. NOTE: The data set SAVEPATH.TEMP has 1101552 observations and 171 variables. NOTE: DATA statement used (Total process time): real time 6.40 seconds cpu time 2.72 seconds 39 40 proc freq data=savepath.temp; 41 tables inter; 42 run; NOTE: There were 1101552 observations read from the data set SAVEPATH.TEMP. NOTE: The PROCEDURE FREQ printed page 1. NOTE: PROCEDURE FREQ used (Total process time): real time 2.13 seconds cpu time 0.91 seconds 43 44 proc freq data=savepath.temp; 45 tables epc_ja*epc_ap*epc_sn/missing; 46 run; NOTE: There were 1101552 observations read from the data set SAVEPATH.TEMP. NOTE: The PROCEDURE FREQ printed page 2. NOTE: PROCEDURE FREQ used (Total process time): real time 1.91 seconds cpu time 0.99 seconds 47 3 The SAS System 15:01 Friday, January 10, 2020 NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414 NOTE: The SAS System used: real time 10.54 seconds cpu time 4.65 seconds
Ok, I've posted the code, output, and log, but I can't give access to the data 😕 It's a protected study in a secure server.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for letting me know.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Is it also possible for you to delete your post with the old log?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Based on your data you should have gotten the following frequencies for the first 3 categories:
JA: 73043
JASN: 186270
JASNAP: 204289
If you add the frequencies for these 3 categories you get 463602. Which is exactly the count you got for category JA in your first table.
This means that the SAS code you had has lumped all these three categories in JA. The code truncated JASN to JA and JASNAP to JA.
The reason is that if you do not define the length of the new variable "inter", which apparently you did not, SAS will decide its length based on the first assignment it encounters. The first assignment was inter="JA", so SAS decides to give the inter variable a length of 2 characters, and whatever you assign to it, only the first 2 characters will be kept. You can easily check that the length of the inter variable is 2. The code you have is fine, just add the following statement right before the first if.
length inter $ 20; * can choose any length to accommodate your longest string for the value of this variable;
Good Luck
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, that worked!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This logic works as expected (see data new step):
data old;
input (var1-var3) ($) @@;
datalines;
T T T T T F T F F
T F T F T T F T F
F F T F F F
;
run;
data expected;
input (var1-var3) ($) newvar :$4. @@;
datalines;
T T T cat1 T T F cat3 T F F cat2
T F T cat? F T T cat? F T F cat?
F F T cat? F F F cat4
;
run;
data new;
set old;
if var1='T' and var2='T' and var3='T' then
newvar='cat1';
else if var1='T' and var2='F' and var3='F' then
newvar='cat2';
else if var1='T' and var2='T' and var3='F' then
newvar='cat3';
else if var1='F' and var2='F' and var3='F' then
newvar='cat4';
else newvar='cat?';
run;
proc print data=old;
proc print data=expected;
proc print data=new;
run;
proc compare base=expected compare=new;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@unison unfortunately this is a case where the small test case will not generalize to the actual use case because the length of the new variable is different. And in SAS new variables default to 8, or the length when it's first encountered, which is likely too short. So even if it is getting coding correctly you won't see the full length of the variable.
@Caetreviop543 I removed the relevant sections. Does the solution proposed work though? If it doesn't the issue lies with your data then, not the code.