BookmarkSubscribeRSS Feed
sks521
Quartz | Level 8

Hi folks,

 

I am trying to do the following;

 

1- remove children <2 in variable 'arrivalage'

2- removing those who don't have either of these diagnosis codes ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004') in any of the variables 'diag_01 to diag_06' or treatment codes ('25', '40') in any of the variables treat_01 to treat_09'.

 

Am I right in writing below code;

 

Data control1;
set Sheffield;
if aekey NE ' ' then visit=1;
*with 2-16 year olds with wheese/asthma diagnosis, investigation and treatment codes in the rest of Yorkshire & Humber;
if arrivalage <2 OR diag_01 - diag_06 NOT in ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004') 
or treat_01 -treat_09 NOT in ('25', '40') then delete;
if procodet = 'RAE' then group1=1;
else group1=0;
run;
11 REPLIES 11
PaigeMiller
Diamond | Level 26
diag_01 - diag_06 NOT in ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004')

diag_01 - diag_06 NOT in

 

will not work, it is not valid SAS syntax

 

You want 

 

diag_01 NOT in ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004')
and diag_02 NOT in ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004')
and ...

You can type (or copy/paste) the rest of the lines. You could also write an ARRAY to handle this without having to specifically code each of diag_01 through diag_06.

--
Paige Miller
sks521
Quartz | Level 8

Thanks but should it not have 'OR' instead of 'AND'? in other words, I am trying to include all observations with any of these variables having any of these codes.  

diag_01 NOT in ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004')
OR diag_02 NOT in ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004')
OR ...
PeterClemmensen
Tourmaline | Level 20

If you want good usable code answer, please provide some usable sample data for us to work with 🙂

PaigeMiller
Diamond | Level 26

@sks521 wrote:

Thanks but should it not have 'OR' instead of 'AND'? in other words, I am trying to include all observations with any of these variables having any of these codes.  

diag_01 NOT in ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004')
OR diag_02 NOT in ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004')
OR ...

I guess I will leave that up to you, since you know what you want better than I know what you want.

--
Paige Miller
sks521
Quartz | Level 8

Here is an example data set;

 

arrivalage  diag_01  diag_02  diag_03  diag_04  diag_05  diag_06  treat_01 treat_02 treat_03 treat_04 treat_05 treat_06 procodet 

1                                                                                                        34           41                                                                   REF

2               521          J441                                                                 221         222                                                                 RAE

1                                             J45                                                     291         43                                                                   RXA

1               38                                                                                     27                                                                                  RWA

2               0325L                                                                               222         21                                                                   REF

Reeza
Super User

That's not usable data that you've posted. Please follow the instructions here to post sample data:

 

Here are instructions on how to provide sample data as a data step:
https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

sks521
Quartz | Level 8

I have given it a try and come up with this sample

data WORK.SHEFFIELD;
  infile datalines dsd truncover;

input aekey:$8. procodet:$8. sex:32. ethnos:$8. activage:32. arrivalage:32. pgpprac:$8. sitetret:$8. aearrivalmode:32. aeattenddisp:32. aerefsource:32. aedepttype:32. aearrivaldate:32. aearrivaltime:32. diag_scheme:32. diag_01:$8. diag_02:$8. diag_03:$8. d
iag_04:$8. diag_05:$8. diag_06:$8. invest_01:$8. invest_02:$8. invest_03:$8. invest_04:$8. invest_05:$8. invest_06:$8. invest_07:$8. invest_08:$8. invest_09:$8. invest_10:$8. invest_11:$8. invest_12:$8. treat_01:32. treat_02:32. treat_03:32. treat_04:32. t
reat_05:$8. treat_06:$8. treat_07:$8. treat_08:$8. treat_09:$8. imd15decile_overall:32.;
datalines;
e81e201e RXF 1 A 2 73 da4863e9   2 3 1 1 19518 . 1014 1             24 24                   . 34 41 .           .
5221b6af RXF 2 J 6 82 37d3d778   2 3 1 1 19537 . 1359 1             1 24                   . 221 222 .           .
c5f9d619 RCB 1 A . 16 b6a4f270 RCB55 2 3 1 1 19165 . 1602 1             24                     . 221 11 .           .
42bdbdc7 RWY 2 A 16 16 6868ed2c RWY01 2 3 1 1 19104 . 231 1 05 15L           1 1                   . 511 . .           .
b43c9c60 RXF 2 A 15 21 b9e9dcb3   2 1 1 1 19526 . 1423 1             24                     . 21 511 .           .
dd8c3208 RWA 1 A 16 16 f6b0b1b3 RWA00 2 2 1 1 19051 . 1123 1 03 25L           3 24                   . 291 43 21 12         .
c810915d RJL 1 A 16 16 V81999 RJL32 2 3 1 1 19013 . 1601 1             24                     . 222 21 236 42         .
4c880879 RP5 2 A 16 16 8e29e8d2 RP5DR 2 2 8 1 19094 . 2115 1 38           24                     . 99 . .           .
4ee99a96 RP5 2 A 16 16 8e29e8d2 RP5DR 2 5 1 1 19221 . 2029 1             1                     . 222 51 .           .
ab1c30f4 RCD 2 A 16 16 eade35a6 RCD00 9 3 8 1 19090 . 1933 1             1                     . 222 . .           .
;;;;

data

ballardw
Super User

@sks521 wrote:

I have given it a try and come up with this sample

data WORK.SHEFFIELD;
  infile datalines dsd truncover;

input aekey:$8. procodet:$8. sex:32. ethnos:$8. activage:32. arrivalage:32. pgpprac:$8. sitetret:$8. aearrivalmode:32. aeattenddisp:32. aerefsource:32. aedepttype:32. aearrivaldate:32. aearrivaltime:32. diag_scheme:32. diag_01:$8. diag_02:$8. diag_03:$8. d
iag_04:$8. diag_05:$8. diag_06:$8. invest_01:$8. invest_02:$8. invest_03:$8. invest_04:$8. invest_05:$8. invest_06:$8. invest_07:$8. invest_08:$8. invest_09:$8. invest_10:$8. invest_11:$8. invest_12:$8. treat_01:32. treat_02:32. treat_03:32. treat_04:32. t
reat_05:$8. treat_06:$8. treat_07:$8. treat_08:$8. treat_09:$8. imd15decile_overall:32.;
datalines;
e81e201e RXF 1 A 2 73 da4863e9   2 3 1 1 19518 . 1014 1             24 24                   . 34 41 .           .
5221b6af RXF 2 J 6 82 37d3d778   2 3 1 1 19537 . 1359 1             1 24                   . 221 222 .           .
c5f9d619 RCB 1 A . 16 b6a4f270 RCB55 2 3 1 1 19165 . 1602 1             24                     . 221 11 .           .
42bdbdc7 RWY 2 A 16 16 6868ed2c RWY01 2 3 1 1 19104 . 231 1 05 15L           1 1                   . 511 . .           .
b43c9c60 RXF 2 A 15 21 b9e9dcb3   2 1 1 1 19526 . 1423 1             24                     . 21 511 .           .
dd8c3208 RWA 1 A 16 16 f6b0b1b3 RWA00 2 2 1 1 19051 . 1123 1 03 25L           3 24                   . 291 43 21 12         .
c810915d RJL 1 A 16 16 V81999 RJL32 2 3 1 1 19013 . 1601 1             24                     . 222 21 236 42         .
4c880879 RP5 2 A 16 16 8e29e8d2 RP5DR 2 2 8 1 19094 . 2115 1 38           24                     . 99 . .           .
4ee99a96 RP5 2 A 16 16 8e29e8d2 RP5DR 2 5 1 1 19221 . 2029 1             1                     . 222 51 .           .
ab1c30f4 RCD 2 A 16 16 eade35a6 RCD00 9 3 8 1 19090 . 1933 1             1                     . 222 . .           .
;;;;

data


Try a data set with only the variables referenced in your original question, make sure it executes properly as pasted (the above doesn't, everything after aekey is missing or blank, your diag_04 variable doesn't exist because the name is split across lines as is treat_0r, and make sure to include some of the code values  you are concerned about.

I could get some of the data to read but I don't think the datalines are correct for the given code.

It would also be a good idea to indicate what the final result for your example data should be.

sks521
Quartz | Level 8

Not sure if I follow what you mean! Now when I re-run the code to produce sample data, I get following log; Looks like there is something wrong but I can't tell what?

NOTE: There were 30 observations read from the data set WORK.SHEFFIELD.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.91 seconds
      cpu time            0.51 seconds


11   %let dataSetName = sheffield;
12   *number of observations you want to keep;
13   %let obsKeep = 10;
14
15
16   ******************************************************
17   DO NOT CHANGE ANYTHING BELOW THIS LINE
18   ******************************************************;
19
20   %let source_path =
20 ! https://gist.githubusercontent.com/statgeek/bcc55940dd825a13b9c8ca40a904cba9/raw/865d2cf18f5
20 ! 150b8e887218dde0fc3951d0ff15b/data2datastep.sas;
21
22   filename reprex url "&source_path";
23   %include reprex;
184  filename reprex;
NOTE: Fileref REPREX has been deassigned.
185
186  option linesize=max;
187  %data2datastep(dsn=&dataSetName, obs=&obsKeep);
NOTE: No rows were selected.
NOTE: No rows were selected.
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.09 seconds
      cpu time            0.01 seconds


DATA2DATASTEP DSN SHEFFIELD
DATA2DATASTEP FILE create_WORK_SHEFFIELD_data.sas
DATA2DATASTEP FMT YES
DATA2DATASTEP FMTLIST
DATA2DATASTEP INPUTLIST aekey:$8. procodet:$8. sex:32. ethnos:$8. activage:32. arrivalage:32. pgpprac:$8. sitetret:$8. aearrivalmode:32. aeattenddisp:32. aerefsource:32. aedepttype:32. aearrivaldate:32. aearrivaltime:32. diag_scheme:32. diag_01:$8.
diag_02:$8. diag_03:$8. diag_04:$8. diag_05:$8. diag_06:$8. invest_01:$8. invest_02:$8. invest_03:$8. invest_04:$8. invest_05:$8. invest_06:$8. invest_07:$8. invest_08:$8. invest_09:$8. invest_10:$8. invest_11:$8. invest_12:$8. treat_01:32. treat_02:32.
treat_03:32. treat_04:32. treat_05:$8. treat_06:$8. treat_07:$8. treat_08:$8. treat_09:$8. imd15decile_overall:32.
DATA2DATASTEP LBL YES
DATA2DATASTEP LIB WORK
DATA2DATASTEP LS 256
DATA2DATASTEP MEMLABEL
DATA2DATASTEP MSGTYPE NOTE
DATA2DATASTEP OBS 10
DATA2DATASTEP OUTLIB WORK
DATA2DATASTEP SQLEXITCODE 0
DATA2DATASTEP SQLOBS 0
DATA2DATASTEP SQLOOPS 17
DATA2DATASTEP SQLRC 0
DATA2DATASTEP SQLXOBS 0
DATA2DATASTEP SQLXOPENERRS 0
DATA2DATASTEP VARLIST aekey procodet sex ethnos activage arrivalage pgpprac sitetret aearrivalmode aeattenddisp aerefsource aedepttype aearrivaldate aearrivaltime diag_scheme diag_01 diag_02 diag_03 diag_04 diag_05 diag_06 invest_01 invest_02 invest_03
invest_04 invest_05 invest_06 invest_07 invest_08 invest_09 invest_10 invest_11 invest_12 treat_01 treat_02 treat_03 treat_04 treat_05 treat_06 treat_07 treat_08 treat_09 imd15decile_overall
NOTE: The quoted string currently being processed has become more than 262 characters long.  You might have unbalanced quotation marks.
WARNING: Apparent symbolic reference LBLLIST not resolved.

data WORK.SHEFFIELD;
  infile datalines dsd truncover;

input aekey:$8. procodet:$8. sex:32. ethnos:$8. activage:32. arrivalage:32. pgpprac:$8. sitetret:$8. aearrivalmode:32. aeattenddisp:32. aerefsource:32. aedepttype:32. aearrivaldate:32. aearrivaltime:32. diag_scheme:32. diag_01:$8. diag_02:$8. diag_03:$8. d
iag_04:$8. diag_05:$8. diag_06:$8. invest_01:$8. invest_02:$8. invest_03:$8. invest_04:$8. invest_05:$8. invest_06:$8. invest_07:$8. invest_08:$8. invest_09:$8. invest_10:$8. invest_11:$8. invest_12:$8. treat_01:32. treat_02:32. treat_03:32. treat_04:32. t
reat_05:$8. treat_06:$8. treat_07:$8. treat_08:$8. treat_09:$8. imd15decile_overall:32.;
datalines;
e81e201e RXF 1 A 2 73 da4863e9   2 3 1 1 19518 1014 1             24 24                     34 41 . .           3
5221b6af RXF 2 J 6 82 37d3d778   2 3 1 1 19537 1359 1             1 24                     221 222 . .           3
c5f9d619 RCB 1 A . 16 b6a4f270 RCB55 2 3 1 1 19165 1602 1             24                       221 11 . .           3
42bdbdc7 RWY 2 A 16 16 6868ed2c RWY01 2 3 1 1 19104 231 1 05 15L           1 1                     511 . . .           9
b43c9c60 RXF 2 A 15 21 b9e9dcb3   2 1 1 1 19526 1423 1             24                       21 511 . .           4
dd8c3208 RWA 1 A 16 16 f6b0b1b3 RWA00 2 2 1 1 19051 1123 1 03 25L           3 24                     291 43 21 12           3
c810915d RJL 1 A 16 16 V81999 RJL32 2 3 1 1 19013 1601 1             24                       222 21 236 42           8
4c880879 RP5 2 A 16 16 8e29e8d2 RP5DR 2 2 8 1 19094 2115 1 38           24                       99 . . .           5
4ee99a96 RP5 2 A 16 16 8e29e8d2 RP5DR 2 5 1 1 19221 2029 1             1                       222 51 . .           5
ab1c30f4 RCD 2 A 16 16 eade35a6 RCD00 9 3 8 1 19090 1933 1             1                       222 . . .           5
;;;;
NOTE: There were 10 observations read from the data set WORK.SHEFFIELD.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.01 seconds

 Also not sure what you mean by final results? I want a data set that contains observations having any of the mentioned codes for any of the mentioned variables (from diag_01 to diag_06 and treat_01 to treat_09).

 

Thanks 

PeterClemmensen
Tourmaline | Level 20

No. The 

 

diag_01 - diag_06 NOT in ('251', 'J441', 'J45'....

  part is not correct. You probably want to use arrays in a situation like this and check each variable separately. In the code below, the - will be interpreted as a minus sign, and diag_01-diag_06 will result in a numeric missing value, which is not desirable. 

Shmuel
Garnet | Level 18

1) Pay attention: if var1-varn in (...)  - sas assumes that you are making a subtraction, and as those variables are CHAR type, it makes a  syntax error.

2) @PaigeMiller used AND between conditions. You want - if no variable has any of those codes then delete. In other words, if any variable has any of those codes then write to output. 

   You can code either:

        

if diag_1 not in <list> and diag_2 not in <list> and ... then delete;

 or use:

if diag_1 in <list> or diag_2 in <list > or ... then output;

3) You can either copy/paste the list between the lines or use a macro as in:

 

%let mylist = ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004');

data want;
  set have;
        array diag daig_1 - diag_6;
        flag = 0;  /* initialized value assumes non code exist */
        do i=1 to dim(diag);
             if diag(i) in &mylist then flag=1;
        end;
        if flag = 1 then output; /* some diagnosis are found */
run;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 1982 views
  • 0 likes
  • 6 in conversation