BookmarkSubscribeRSS Feed
sks521
Quartz | Level 8

Hi folks,

 

I am trying to do the following;

 

1- remove children <2 in variable 'arrivalage'

2- removing those who don't have either of these diagnosis codes ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004') in any of the variables 'diag_01 to diag_06' or treatment codes ('25', '40') in any of the variables treat_01 to treat_09'.

 

Am I right in writing below code;

 

Data control1;
set Sheffield;
if aekey NE ' ' then visit=1;
*with 2-16 year olds with wheese/asthma diagnosis, investigation and treatment codes in the rest of Yorkshire & Humber;
if arrivalage <2 OR diag_01 - diag_06 NOT in ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004') 
or treat_01 -treat_09 NOT in ('25', '40') then delete;
if procodet = 'RAE' then group1=1;
else group1=0;
run;
11 REPLIES 11
PaigeMiller
Diamond | Level 26
diag_01 - diag_06 NOT in ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004')

diag_01 - diag_06 NOT in

 

will not work, it is not valid SAS syntax

 

You want 

 

diag_01 NOT in ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004')
and diag_02 NOT in ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004')
and ...

You can type (or copy/paste) the rest of the lines. You could also write an ARRAY to handle this without having to specifically code each of diag_01 through diag_06.

--
Paige Miller
sks521
Quartz | Level 8

Thanks but should it not have 'OR' instead of 'AND'? in other words, I am trying to include all observations with any of these variables having any of these codes.  

diag_01 NOT in ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004')
OR diag_02 NOT in ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004')
OR ...
PeterClemmensen
Tourmaline | Level 20

If you want good usable code answer, please provide some usable sample data for us to work with 🙂

PaigeMiller
Diamond | Level 26

@sks521 wrote:

Thanks but should it not have 'OR' instead of 'AND'? in other words, I am trying to include all observations with any of these variables having any of these codes.  

diag_01 NOT in ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004')
OR diag_02 NOT in ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004')
OR ...

I guess I will leave that up to you, since you know what you want better than I know what you want.

--
Paige Miller
sks521
Quartz | Level 8

Here is an example data set;

 

arrivalage  diag_01  diag_02  diag_03  diag_04  diag_05  diag_06  treat_01 treat_02 treat_03 treat_04 treat_05 treat_06 procodet 

1                                                                                                        34           41                                                                   REF

2               521          J441                                                                 221         222                                                                 RAE

1                                             J45                                                     291         43                                                                   RXA

1               38                                                                                     27                                                                                  RWA

2               0325L                                                                               222         21                                                                   REF

Reeza
Super User

That's not usable data that you've posted. Please follow the instructions here to post sample data:

 

Here are instructions on how to provide sample data as a data step:
https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

sks521
Quartz | Level 8

I have given it a try and come up with this sample

data WORK.SHEFFIELD;
  infile datalines dsd truncover;

input aekey:$8. procodet:$8. sex:32. ethnos:$8. activage:32. arrivalage:32. pgpprac:$8. sitetret:$8. aearrivalmode:32. aeattenddisp:32. aerefsource:32. aedepttype:32. aearrivaldate:32. aearrivaltime:32. diag_scheme:32. diag_01:$8. diag_02:$8. diag_03:$8. d
iag_04:$8. diag_05:$8. diag_06:$8. invest_01:$8. invest_02:$8. invest_03:$8. invest_04:$8. invest_05:$8. invest_06:$8. invest_07:$8. invest_08:$8. invest_09:$8. invest_10:$8. invest_11:$8. invest_12:$8. treat_01:32. treat_02:32. treat_03:32. treat_04:32. t
reat_05:$8. treat_06:$8. treat_07:$8. treat_08:$8. treat_09:$8. imd15decile_overall:32.;
datalines;
e81e201e RXF 1 A 2 73 da4863e9   2 3 1 1 19518 . 1014 1             24 24                   . 34 41 .           .
5221b6af RXF 2 J 6 82 37d3d778   2 3 1 1 19537 . 1359 1             1 24                   . 221 222 .           .
c5f9d619 RCB 1 A . 16 b6a4f270 RCB55 2 3 1 1 19165 . 1602 1             24                     . 221 11 .           .
42bdbdc7 RWY 2 A 16 16 6868ed2c RWY01 2 3 1 1 19104 . 231 1 05 15L           1 1                   . 511 . .           .
b43c9c60 RXF 2 A 15 21 b9e9dcb3   2 1 1 1 19526 . 1423 1             24                     . 21 511 .           .
dd8c3208 RWA 1 A 16 16 f6b0b1b3 RWA00 2 2 1 1 19051 . 1123 1 03 25L           3 24                   . 291 43 21 12         .
c810915d RJL 1 A 16 16 V81999 RJL32 2 3 1 1 19013 . 1601 1             24                     . 222 21 236 42         .
4c880879 RP5 2 A 16 16 8e29e8d2 RP5DR 2 2 8 1 19094 . 2115 1 38           24                     . 99 . .           .
4ee99a96 RP5 2 A 16 16 8e29e8d2 RP5DR 2 5 1 1 19221 . 2029 1             1                     . 222 51 .           .
ab1c30f4 RCD 2 A 16 16 eade35a6 RCD00 9 3 8 1 19090 . 1933 1             1                     . 222 . .           .
;;;;

data

ballardw
Super User

@sks521 wrote:

I have given it a try and come up with this sample

data WORK.SHEFFIELD;
  infile datalines dsd truncover;

input aekey:$8. procodet:$8. sex:32. ethnos:$8. activage:32. arrivalage:32. pgpprac:$8. sitetret:$8. aearrivalmode:32. aeattenddisp:32. aerefsource:32. aedepttype:32. aearrivaldate:32. aearrivaltime:32. diag_scheme:32. diag_01:$8. diag_02:$8. diag_03:$8. d
iag_04:$8. diag_05:$8. diag_06:$8. invest_01:$8. invest_02:$8. invest_03:$8. invest_04:$8. invest_05:$8. invest_06:$8. invest_07:$8. invest_08:$8. invest_09:$8. invest_10:$8. invest_11:$8. invest_12:$8. treat_01:32. treat_02:32. treat_03:32. treat_04:32. t
reat_05:$8. treat_06:$8. treat_07:$8. treat_08:$8. treat_09:$8. imd15decile_overall:32.;
datalines;
e81e201e RXF 1 A 2 73 da4863e9   2 3 1 1 19518 . 1014 1             24 24                   . 34 41 .           .
5221b6af RXF 2 J 6 82 37d3d778   2 3 1 1 19537 . 1359 1             1 24                   . 221 222 .           .
c5f9d619 RCB 1 A . 16 b6a4f270 RCB55 2 3 1 1 19165 . 1602 1             24                     . 221 11 .           .
42bdbdc7 RWY 2 A 16 16 6868ed2c RWY01 2 3 1 1 19104 . 231 1 05 15L           1 1                   . 511 . .           .
b43c9c60 RXF 2 A 15 21 b9e9dcb3   2 1 1 1 19526 . 1423 1             24                     . 21 511 .           .
dd8c3208 RWA 1 A 16 16 f6b0b1b3 RWA00 2 2 1 1 19051 . 1123 1 03 25L           3 24                   . 291 43 21 12         .
c810915d RJL 1 A 16 16 V81999 RJL32 2 3 1 1 19013 . 1601 1             24                     . 222 21 236 42         .
4c880879 RP5 2 A 16 16 8e29e8d2 RP5DR 2 2 8 1 19094 . 2115 1 38           24                     . 99 . .           .
4ee99a96 RP5 2 A 16 16 8e29e8d2 RP5DR 2 5 1 1 19221 . 2029 1             1                     . 222 51 .           .
ab1c30f4 RCD 2 A 16 16 eade35a6 RCD00 9 3 8 1 19090 . 1933 1             1                     . 222 . .           .
;;;;

data


Try a data set with only the variables referenced in your original question, make sure it executes properly as pasted (the above doesn't, everything after aekey is missing or blank, your diag_04 variable doesn't exist because the name is split across lines as is treat_0r, and make sure to include some of the code values  you are concerned about.

I could get some of the data to read but I don't think the datalines are correct for the given code.

It would also be a good idea to indicate what the final result for your example data should be.

sks521
Quartz | Level 8

Not sure if I follow what you mean! Now when I re-run the code to produce sample data, I get following log; Looks like there is something wrong but I can't tell what?

NOTE: There were 30 observations read from the data set WORK.SHEFFIELD.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.91 seconds
      cpu time            0.51 seconds


11   %let dataSetName = sheffield;
12   *number of observations you want to keep;
13   %let obsKeep = 10;
14
15
16   ******************************************************
17   DO NOT CHANGE ANYTHING BELOW THIS LINE
18   ******************************************************;
19
20   %let source_path =
20 ! https://gist.githubusercontent.com/statgeek/bcc55940dd825a13b9c8ca40a904cba9/raw/865d2cf18f5
20 ! 150b8e887218dde0fc3951d0ff15b/data2datastep.sas;
21
22   filename reprex url "&source_path";
23   %include reprex;
184  filename reprex;
NOTE: Fileref REPREX has been deassigned.
185
186  option linesize=max;
187  %data2datastep(dsn=&dataSetName, obs=&obsKeep);
NOTE: No rows were selected.
NOTE: No rows were selected.
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.09 seconds
      cpu time            0.01 seconds


DATA2DATASTEP DSN SHEFFIELD
DATA2DATASTEP FILE create_WORK_SHEFFIELD_data.sas
DATA2DATASTEP FMT YES
DATA2DATASTEP FMTLIST
DATA2DATASTEP INPUTLIST aekey:$8. procodet:$8. sex:32. ethnos:$8. activage:32. arrivalage:32. pgpprac:$8. sitetret:$8. aearrivalmode:32. aeattenddisp:32. aerefsource:32. aedepttype:32. aearrivaldate:32. aearrivaltime:32. diag_scheme:32. diag_01:$8.
diag_02:$8. diag_03:$8. diag_04:$8. diag_05:$8. diag_06:$8. invest_01:$8. invest_02:$8. invest_03:$8. invest_04:$8. invest_05:$8. invest_06:$8. invest_07:$8. invest_08:$8. invest_09:$8. invest_10:$8. invest_11:$8. invest_12:$8. treat_01:32. treat_02:32.
treat_03:32. treat_04:32. treat_05:$8. treat_06:$8. treat_07:$8. treat_08:$8. treat_09:$8. imd15decile_overall:32.
DATA2DATASTEP LBL YES
DATA2DATASTEP LIB WORK
DATA2DATASTEP LS 256
DATA2DATASTEP MEMLABEL
DATA2DATASTEP MSGTYPE NOTE
DATA2DATASTEP OBS 10
DATA2DATASTEP OUTLIB WORK
DATA2DATASTEP SQLEXITCODE 0
DATA2DATASTEP SQLOBS 0
DATA2DATASTEP SQLOOPS 17
DATA2DATASTEP SQLRC 0
DATA2DATASTEP SQLXOBS 0
DATA2DATASTEP SQLXOPENERRS 0
DATA2DATASTEP VARLIST aekey procodet sex ethnos activage arrivalage pgpprac sitetret aearrivalmode aeattenddisp aerefsource aedepttype aearrivaldate aearrivaltime diag_scheme diag_01 diag_02 diag_03 diag_04 diag_05 diag_06 invest_01 invest_02 invest_03
invest_04 invest_05 invest_06 invest_07 invest_08 invest_09 invest_10 invest_11 invest_12 treat_01 treat_02 treat_03 treat_04 treat_05 treat_06 treat_07 treat_08 treat_09 imd15decile_overall
NOTE: The quoted string currently being processed has become more than 262 characters long.  You might have unbalanced quotation marks.
WARNING: Apparent symbolic reference LBLLIST not resolved.

data WORK.SHEFFIELD;
  infile datalines dsd truncover;

input aekey:$8. procodet:$8. sex:32. ethnos:$8. activage:32. arrivalage:32. pgpprac:$8. sitetret:$8. aearrivalmode:32. aeattenddisp:32. aerefsource:32. aedepttype:32. aearrivaldate:32. aearrivaltime:32. diag_scheme:32. diag_01:$8. diag_02:$8. diag_03:$8. d
iag_04:$8. diag_05:$8. diag_06:$8. invest_01:$8. invest_02:$8. invest_03:$8. invest_04:$8. invest_05:$8. invest_06:$8. invest_07:$8. invest_08:$8. invest_09:$8. invest_10:$8. invest_11:$8. invest_12:$8. treat_01:32. treat_02:32. treat_03:32. treat_04:32. t
reat_05:$8. treat_06:$8. treat_07:$8. treat_08:$8. treat_09:$8. imd15decile_overall:32.;
datalines;
e81e201e RXF 1 A 2 73 da4863e9   2 3 1 1 19518 1014 1             24 24                     34 41 . .           3
5221b6af RXF 2 J 6 82 37d3d778   2 3 1 1 19537 1359 1             1 24                     221 222 . .           3
c5f9d619 RCB 1 A . 16 b6a4f270 RCB55 2 3 1 1 19165 1602 1             24                       221 11 . .           3
42bdbdc7 RWY 2 A 16 16 6868ed2c RWY01 2 3 1 1 19104 231 1 05 15L           1 1                     511 . . .           9
b43c9c60 RXF 2 A 15 21 b9e9dcb3   2 1 1 1 19526 1423 1             24                       21 511 . .           4
dd8c3208 RWA 1 A 16 16 f6b0b1b3 RWA00 2 2 1 1 19051 1123 1 03 25L           3 24                     291 43 21 12           3
c810915d RJL 1 A 16 16 V81999 RJL32 2 3 1 1 19013 1601 1             24                       222 21 236 42           8
4c880879 RP5 2 A 16 16 8e29e8d2 RP5DR 2 2 8 1 19094 2115 1 38           24                       99 . . .           5
4ee99a96 RP5 2 A 16 16 8e29e8d2 RP5DR 2 5 1 1 19221 2029 1             1                       222 51 . .           5
ab1c30f4 RCD 2 A 16 16 eade35a6 RCD00 9 3 8 1 19090 1933 1             1                       222 . . .           5
;;;;
NOTE: There were 10 observations read from the data set WORK.SHEFFIELD.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.01 seconds

 Also not sure what you mean by final results? I want a data set that contains observations having any of the mentioned codes for any of the mentioned variables (from diag_01 to diag_06 and treat_01 to treat_09).

 

Thanks 

PeterClemmensen
Tourmaline | Level 20

No. The 

 

diag_01 - diag_06 NOT in ('251', 'J441', 'J45'....

  part is not correct. You probably want to use arrays in a situation like this and check each variable separately. In the code below, the - will be interpreted as a minus sign, and diag_01-diag_06 will result in a numeric missing value, which is not desirable. 

Shmuel
Garnet | Level 18

1) Pay attention: if var1-varn in (...)  - sas assumes that you are making a subtraction, and as those variables are CHAR type, it makes a  syntax error.

2) @PaigeMiller used AND between conditions. You want - if no variable has any of those codes then delete. In other words, if any variable has any of those codes then write to output. 

   You can code either:

        

if diag_1 not in <list> and diag_2 not in <list> and ... then delete;

 or use:

if diag_1 in <list> or diag_2 in <list > or ... then output;

3) You can either copy/paste the list between the lines or use a macro as in:

 

%let mylist = ('251', 'J441', 'J45', 'J459', 'J46', 'J46X', 'R062', '7001', '7002', '7003', '7004');

data want;
  set have;
        array diag daig_1 - diag_6;
        flag = 0;  /* initialized value assumes non code exist */
        do i=1 to dim(diag);
             if diag(i) in &mylist then flag=1;
        end;
        if flag = 1 then output; /* some diagnosis are found */
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 622 views
  • 0 likes
  • 6 in conversation