Hi folks,
I am trying to do the following;
1- remove children <2 in variable 'arrivalage'
2- removing those who don't have either of these diagnosis codes ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004') in any of the variables 'diag_01 to diag_06' or treatment codes ('25', '40') in any of the variables treat_01 to treat_09'.
Am I right in writing below code;
Data control1; set Sheffield; if aekey NE ' ' then visit=1; *with 2-16 year olds with wheese/asthma diagnosis, investigation and treatment codes in the rest of Yorkshire & Humber; if arrivalage <2 OR diag_01 - diag_06 NOT in ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004') or treat_01 -treat_09 NOT in ('25', '40') then delete; if procodet = 'RAE' then group1=1; else group1=0; run;
diag_01 - diag_06 NOT in ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004')
diag_01 - diag_06 NOT in
will not work, it is not valid SAS syntax
You want
diag_01 NOT in ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004')
and diag_02 NOT in ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004')
and ...
You can type (or copy/paste) the rest of the lines. You could also write an ARRAY to handle this without having to specifically code each of diag_01 through diag_06.
Thanks but should it not have 'OR' instead of 'AND'? in other words, I am trying to include all observations with any of these variables having any of these codes.
diag_01 NOT in ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004') OR diag_02 NOT in ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004') OR ...
If you want good usable code answer, please provide some usable sample data for us to work with 🙂
@sks521 wrote:
Thanks but should it not have 'OR' instead of 'AND'? in other words, I am trying to include all observations with any of these variables having any of these codes.
diag_01 NOT in ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004') OR diag_02 NOT in ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004') OR ...
I guess I will leave that up to you, since you know what you want better than I know what you want.
Here is an example data set;
arrivalage diag_01 diag_02 diag_03 diag_04 diag_05 diag_06 treat_01 treat_02 treat_03 treat_04 treat_05 treat_06 procodet
1 34 41 REF
2 521 J441 221 222 RAE
1 J45 291 43 RXA
1 38 27 RWA
2 0325L 222 21 REF
That's not usable data that you've posted. Please follow the instructions here to post sample data:
Here are instructions on how to provide sample data as a data step:
https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...
I have given it a try and come up with this sample
data WORK.SHEFFIELD; infile datalines dsd truncover; input aekey:$8. procodet:$8. sex:32. ethnos:$8. activage:32. arrivalage:32. pgpprac:$8. sitetret:$8. aearrivalmode:32. aeattenddisp:32. aerefsource:32. aedepttype:32. aearrivaldate:32. aearrivaltime:32. diag_scheme:32. diag_01:$8. diag_02:$8. diag_03:$8. d iag_04:$8. diag_05:$8. diag_06:$8. invest_01:$8. invest_02:$8. invest_03:$8. invest_04:$8. invest_05:$8. invest_06:$8. invest_07:$8. invest_08:$8. invest_09:$8. invest_10:$8. invest_11:$8. invest_12:$8. treat_01:32. treat_02:32. treat_03:32. treat_04:32. t reat_05:$8. treat_06:$8. treat_07:$8. treat_08:$8. treat_09:$8. imd15decile_overall:32.; datalines; e81e201e RXF 1 A 2 73 da4863e9 2 3 1 1 19518 . 1014 1 24 24 . 34 41 . . 5221b6af RXF 2 J 6 82 37d3d778 2 3 1 1 19537 . 1359 1 1 24 . 221 222 . . c5f9d619 RCB 1 A . 16 b6a4f270 RCB55 2 3 1 1 19165 . 1602 1 24 . 221 11 . . 42bdbdc7 RWY 2 A 16 16 6868ed2c RWY01 2 3 1 1 19104 . 231 1 05 15L 1 1 . 511 . . . b43c9c60 RXF 2 A 15 21 b9e9dcb3 2 1 1 1 19526 . 1423 1 24 . 21 511 . . dd8c3208 RWA 1 A 16 16 f6b0b1b3 RWA00 2 2 1 1 19051 . 1123 1 03 25L 3 24 . 291 43 21 12 . c810915d RJL 1 A 16 16 V81999 RJL32 2 3 1 1 19013 . 1601 1 24 . 222 21 236 42 . 4c880879 RP5 2 A 16 16 8e29e8d2 RP5DR 2 2 8 1 19094 . 2115 1 38 24 . 99 . . . 4ee99a96 RP5 2 A 16 16 8e29e8d2 RP5DR 2 5 1 1 19221 . 2029 1 1 . 222 51 . . ab1c30f4 RCD 2 A 16 16 eade35a6 RCD00 9 3 8 1 19090 . 1933 1 1 . 222 . . . ;;;;
data
@sks521 wrote:
I have given it a try and come up with this sample
data WORK.SHEFFIELD; infile datalines dsd truncover; input aekey:$8. procodet:$8. sex:32. ethnos:$8. activage:32. arrivalage:32. pgpprac:$8. sitetret:$8. aearrivalmode:32. aeattenddisp:32. aerefsource:32. aedepttype:32. aearrivaldate:32. aearrivaltime:32. diag_scheme:32. diag_01:$8. diag_02:$8. diag_03:$8. d iag_04:$8. diag_05:$8. diag_06:$8. invest_01:$8. invest_02:$8. invest_03:$8. invest_04:$8. invest_05:$8. invest_06:$8. invest_07:$8. invest_08:$8. invest_09:$8. invest_10:$8. invest_11:$8. invest_12:$8. treat_01:32. treat_02:32. treat_03:32. treat_04:32. t reat_05:$8. treat_06:$8. treat_07:$8. treat_08:$8. treat_09:$8. imd15decile_overall:32.; datalines; e81e201e RXF 1 A 2 73 da4863e9 2 3 1 1 19518 . 1014 1 24 24 . 34 41 . . 5221b6af RXF 2 J 6 82 37d3d778 2 3 1 1 19537 . 1359 1 1 24 . 221 222 . . c5f9d619 RCB 1 A . 16 b6a4f270 RCB55 2 3 1 1 19165 . 1602 1 24 . 221 11 . . 42bdbdc7 RWY 2 A 16 16 6868ed2c RWY01 2 3 1 1 19104 . 231 1 05 15L 1 1 . 511 . . . b43c9c60 RXF 2 A 15 21 b9e9dcb3 2 1 1 1 19526 . 1423 1 24 . 21 511 . . dd8c3208 RWA 1 A 16 16 f6b0b1b3 RWA00 2 2 1 1 19051 . 1123 1 03 25L 3 24 . 291 43 21 12 . c810915d RJL 1 A 16 16 V81999 RJL32 2 3 1 1 19013 . 1601 1 24 . 222 21 236 42 . 4c880879 RP5 2 A 16 16 8e29e8d2 RP5DR 2 2 8 1 19094 . 2115 1 38 24 . 99 . . . 4ee99a96 RP5 2 A 16 16 8e29e8d2 RP5DR 2 5 1 1 19221 . 2029 1 1 . 222 51 . . ab1c30f4 RCD 2 A 16 16 eade35a6 RCD00 9 3 8 1 19090 . 1933 1 1 . 222 . . . ;;;;data
Try a data set with only the variables referenced in your original question, make sure it executes properly as pasted (the above doesn't, everything after aekey is missing or blank, your diag_04 variable doesn't exist because the name is split across lines as is treat_0r, and make sure to include some of the code values you are concerned about.
I could get some of the data to read but I don't think the datalines are correct for the given code.
It would also be a good idea to indicate what the final result for your example data should be.
Not sure if I follow what you mean! Now when I re-run the code to produce sample data, I get following log; Looks like there is something wrong but I can't tell what?
NOTE: There were 30 observations read from the data set WORK.SHEFFIELD. NOTE: PROCEDURE PRINT used (Total process time): real time 0.91 seconds cpu time 0.51 seconds 11 %let dataSetName = sheffield; 12 *number of observations you want to keep; 13 %let obsKeep = 10; 14 15 16 ****************************************************** 17 DO NOT CHANGE ANYTHING BELOW THIS LINE 18 ******************************************************; 19 20 %let source_path = 20 ! https://gist.githubusercontent.com/statgeek/bcc55940dd825a13b9c8ca40a904cba9/raw/865d2cf18f5 20 ! 150b8e887218dde0fc3951d0ff15b/data2datastep.sas; 21 22 filename reprex url "&source_path"; 23 %include reprex; 184 filename reprex; NOTE: Fileref REPREX has been deassigned. 185 186 option linesize=max; 187 %data2datastep(dsn=&dataSetName, obs=&obsKeep); NOTE: No rows were selected. NOTE: No rows were selected. NOTE: PROCEDURE SQL used (Total process time): real time 0.09 seconds cpu time 0.01 seconds DATA2DATASTEP DSN SHEFFIELD DATA2DATASTEP FILE create_WORK_SHEFFIELD_data.sas DATA2DATASTEP FMT YES DATA2DATASTEP FMTLIST DATA2DATASTEP INPUTLIST aekey:$8. procodet:$8. sex:32. ethnos:$8. activage:32. arrivalage:32. pgpprac:$8. sitetret:$8. aearrivalmode:32. aeattenddisp:32. aerefsource:32. aedepttype:32. aearrivaldate:32. aearrivaltime:32. diag_scheme:32. diag_01:$8. diag_02:$8. diag_03:$8. diag_04:$8. diag_05:$8. diag_06:$8. invest_01:$8. invest_02:$8. invest_03:$8. invest_04:$8. invest_05:$8. invest_06:$8. invest_07:$8. invest_08:$8. invest_09:$8. invest_10:$8. invest_11:$8. invest_12:$8. treat_01:32. treat_02:32. treat_03:32. treat_04:32. treat_05:$8. treat_06:$8. treat_07:$8. treat_08:$8. treat_09:$8. imd15decile_overall:32. DATA2DATASTEP LBL YES DATA2DATASTEP LIB WORK DATA2DATASTEP LS 256 DATA2DATASTEP MEMLABEL DATA2DATASTEP MSGTYPE NOTE DATA2DATASTEP OBS 10 DATA2DATASTEP OUTLIB WORK DATA2DATASTEP SQLEXITCODE 0 DATA2DATASTEP SQLOBS 0 DATA2DATASTEP SQLOOPS 17 DATA2DATASTEP SQLRC 0 DATA2DATASTEP SQLXOBS 0 DATA2DATASTEP SQLXOPENERRS 0 DATA2DATASTEP VARLIST aekey procodet sex ethnos activage arrivalage pgpprac sitetret aearrivalmode aeattenddisp aerefsource aedepttype aearrivaldate aearrivaltime diag_scheme diag_01 diag_02 diag_03 diag_04 diag_05 diag_06 invest_01 invest_02 invest_03 invest_04 invest_05 invest_06 invest_07 invest_08 invest_09 invest_10 invest_11 invest_12 treat_01 treat_02 treat_03 treat_04 treat_05 treat_06 treat_07 treat_08 treat_09 imd15decile_overall NOTE: The quoted string currently being processed has become more than 262 characters long. You might have unbalanced quotation marks. WARNING: Apparent symbolic reference LBLLIST not resolved. data WORK.SHEFFIELD; infile datalines dsd truncover; input aekey:$8. procodet:$8. sex:32. ethnos:$8. activage:32. arrivalage:32. pgpprac:$8. sitetret:$8. aearrivalmode:32. aeattenddisp:32. aerefsource:32. aedepttype:32. aearrivaldate:32. aearrivaltime:32. diag_scheme:32. diag_01:$8. diag_02:$8. diag_03:$8. d iag_04:$8. diag_05:$8. diag_06:$8. invest_01:$8. invest_02:$8. invest_03:$8. invest_04:$8. invest_05:$8. invest_06:$8. invest_07:$8. invest_08:$8. invest_09:$8. invest_10:$8. invest_11:$8. invest_12:$8. treat_01:32. treat_02:32. treat_03:32. treat_04:32. t reat_05:$8. treat_06:$8. treat_07:$8. treat_08:$8. treat_09:$8. imd15decile_overall:32.; datalines; e81e201e RXF 1 A 2 73 da4863e9 2 3 1 1 19518 1014 1 24 24 34 41 . . 3 5221b6af RXF 2 J 6 82 37d3d778 2 3 1 1 19537 1359 1 1 24 221 222 . . 3 c5f9d619 RCB 1 A . 16 b6a4f270 RCB55 2 3 1 1 19165 1602 1 24 221 11 . . 3 42bdbdc7 RWY 2 A 16 16 6868ed2c RWY01 2 3 1 1 19104 231 1 05 15L 1 1 511 . . . 9 b43c9c60 RXF 2 A 15 21 b9e9dcb3 2 1 1 1 19526 1423 1 24 21 511 . . 4 dd8c3208 RWA 1 A 16 16 f6b0b1b3 RWA00 2 2 1 1 19051 1123 1 03 25L 3 24 291 43 21 12 3 c810915d RJL 1 A 16 16 V81999 RJL32 2 3 1 1 19013 1601 1 24 222 21 236 42 8 4c880879 RP5 2 A 16 16 8e29e8d2 RP5DR 2 2 8 1 19094 2115 1 38 24 99 . . . 5 4ee99a96 RP5 2 A 16 16 8e29e8d2 RP5DR 2 5 1 1 19221 2029 1 1 222 51 . . 5 ab1c30f4 RCD 2 A 16 16 eade35a6 RCD00 9 3 8 1 19090 1933 1 1 222 . . . 5 ;;;; NOTE: There were 10 observations read from the data set WORK.SHEFFIELD. NOTE: DATA statement used (Total process time): real time 0.03 seconds cpu time 0.01 seconds
Also not sure what you mean by final results? I want a data set that contains observations having any of the mentioned codes for any of the mentioned variables (from diag_01 to diag_06 and treat_01 to treat_09).
Thanks
No. The
diag_01 - diag_06 NOT in ('251', 'J441', 'J45'....
part is not correct. You probably want to use arrays in a situation like this and check each variable separately. In the code below, the - will be interpreted as a minus sign, and diag_01-diag_06 will result in a numeric missing value, which is not desirable.
1) Pay attention: if var1-varn in (...) - sas assumes that you are making a subtraction, and as those variables are CHAR type, it makes a syntax error.
2) @PaigeMiller used AND between conditions. You want - if no variable has any of those codes then delete. In other words, if any variable has any of those codes then write to output.
You can code either:
if diag_1 not in <list> and diag_2 not in <list> and ... then delete;
or use:
if diag_1 in <list> or diag_2 in <list > or ... then output;
3) You can either copy/paste the list between the lines or use a macro as in:
%let mylist = ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004');
data want;
set have;
array diag daig_1 - diag_6;
flag = 0; /* initialized value assumes non code exist */
do i=1 to dim(diag);
if diag(i) in &mylist then flag=1;
end;
if flag = 1 then output; /* some diagnosis are found */
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.