I have run this array but getting an error message as follows; (these are all character variables listed as treat_01 to treat_12)
285 data CONTROL1 (drop = _:) ; 286 set VISITS ; 287 array dd TREAT: ; ERROR: All variables in array list must be the same type, i.e., all numeric or character. ERROR: All variables in array list must be the same type, i.e., all numeric or character. ERROR: All variables in array list must be the same type, i.e., all numeric or character. ERROR: All variables in array list must be the same type, i.e., all numeric or character. ERROR: All variables in array list must be the same type, i.e., all numeric or character. 288 do over dd ; 289 if prxmatch ("m/J21|J0|J38|J37/oi", dd) then do ; 290 * if find (dd, "J21") or find (dd, "J0") or find (dd, "J38") or find (dd, "J37") then do ; 291 _exclude = 1 ; 292 leave ; 293 end ; 294 end ; 295 if not _exclude ; 296 run ; NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column). 289:41 NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.CONTROL1 may be incomplete. When this step was stopped there were 0 observations and 30 variables. WARNING: Data set WORK.CONTROL1 was not replaced because this step was stopped. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
The SAS log tells you plainly that different TREAT* variables have both character and numeric data types. From the standpoint of process organization and the nature of the task, this is called "dirty data": They should be all character.
The proper solution is to convert all the numeric TREAT variables to character; then the code will run error-free.
A palliative solution is to use the CATX trick offered by @ChrisNZ, as unlike an array, the collective colon (:) modifier reference doesn't suggest the same data type across all TREAT: variables in the context of using it with the CATX function, since the latter converts numeric variables to the character type automatically:
data WANT ;
set HAVE ;
if prxmatch ("m/\b26|\bA0/oi", catx (' ', of TREAT:)) = 0
and
prxmatch ("m/\bJ44|\bJ45|\b251/oi", catx (' ', of TREAT:)) = 0
;
run ;
Kind regards
Paul D.
Thanks but can I check in if this step will get rid of all observations with any diagnoses other than the ones specified (asthma and gastroenteritis) here please?
Thanks
S
Hi Paul,
These are all character variables and I have defined them as character variables while importing data into SAS. How come I am getting errors.
And also I have tried to use the syntax you suggested but still getting errors, this time using diagnosis variables;
*Deleting observations with croup diagnosis; *USING ARRAY FROM THE SAS SUPPORT COMMUNITY; data CONTROL1 (drop = _:) ; set VISITS ; array dd diag: ; do over dd ; if prxmatch ("m/J05/oi", catx (' ', of diag:)) then do ; * if find (dd, "J21") or find (dd, "J0") or find (dd, "J38") or find (dd, "J37") then do ; _exclude = 1 ; leave ; end ; end ; if not _exclude ; run ;
I am pasting sample data for my data set for you play with;
input AEKEY:$32. patient_id:$32. link_method:$18. procodet:$3. postcode_flag:$5. sex:$1. ethnos:$2. activage:$32. ARRIVALAGE:$32. pgpprac:$4. sitetret:$5. aearrivalmode:32. aeattenddisp:32. aerefsource:32. aedepttype:32. aearrivaldate:32. aearrivaltime:32. diag_scheme:32. diag_01:$6. diag_02:$6. diag_03:$6. diag_04:$6. diag_05:$6. diag_06:$6. diag_07:$6. diag_08:$6. diag_09:$6. diag_10:$6. diag_11:$6. diag_12:$6. invest_01:$4. invest_02:$4. invest_03:$4. invest_04:$4. invest_05:$4. invest_06:$4. invest_07:$ 4. invest_08:$4. invest_09:$4. invest_10:$4. invest_11:$4. invest_12:$4. treat_01:$4. treat_02:$4. treat_03:$4. treat_04:$4. treat_05:$4. treat_06:$4. treat_07:$4. treat_08:$4. treat_09:$4. treat_10:$4. treat_11:$4. treat_12:$4. imd15decile_overall:32.; datalines; e81e201e78d19d8c3f5122ce7d96c25a 97e023526b0dc2b5e9e08eed37775ab8 NHSNO RXF VALID 1 A 002 73 da48 2 3 1 1 19518 1014 1 24 24 34 41 3 5221b6afed59cee127663fe106688fd0 b76acf7d6e16213737184da41a1794a1 NHSNO RXF VALID 2 J 006 82 37d3 2 3 1 1 19537 1359 1 01 24 221 222 3 c5f9d619392b7317bff4355d7e0c3829 8e19bf9b06f32ea3538b417a69254e3f NHSNO RCB VALID 1 A 16 b6a4 RCB55 2 3 1 1 19165 1602 1 24 221 011 3 42bdbdc79100370c0f7b92e8219c950f 2a44ce4db8a93eb5a111d0900a818351 NHSNO RWY VALID 2 A 016 16 6868 RWY01 2 3 1 1 19104 231 1 05 15L 01 01 511 9 b43c9c6090a4a0eb69b5bd88f8b76e8f 1e46f877744f9cceec87405da4716053 NHSNO RXF VALID 2 A 015 21 b9e9 2 1 1 1 19526 1423 1 24 21 511 4 dd8c32086cfd590049fdeb9817083967 f21138c84b7ea596bb1aeb4a7cbb1e32 NHSNO RWA VALID 1 A 016 16 f6b0 RWA00 2 2 1 1 19051 1123 1 03 25L 03 24 291 43 21 12 3 c810915d6201106c5e9caec870183a21 23f76559afb9135cb92fbcf9072d9957 FULL ID NHSNO LINK RJL VALID 1 A 016 16 V819 RJL32 2 3 1 1 19013 1601 1 24 222 21 236 042 8 4c8808798cb8278f03540122edb7f1b1 1152df925fcc5d927f5e811485338fc0 NHSNO RP5 VALID 2 A 016 16 8e29 RP5DR 2 2 8 1 19094 2115 1 38 24 99 5 4ee99a96e5cdee139f6c7b4a3ec31734 1152df925fcc5d927f5e811485338fc0 NHSNO RP5 VALID 2 A 016 16 8e29 RP5DR 2 5 1 1 19221 2029 1 01 222 51 5 ab1c30f4ae71acebfc5dc9cff812ae53 3ab4b9ec7420ffa8988cfa496116359c NHSNO RCD VALID 2 A 016 16 eade RCD00 9 3 8 1 19090 1933 1 01 222 5 002c5a7a483c7f9a281393e9d5886edd 36c1b661c6472d7c520013558fab9993 NHSNO RXF VALID 1 A 012 41 e7c4 2 12 1 4 19512 952 1 38 24 24 27 2 dcc81124ce35647223c9d78c59bfa4c9 0bb5d959bd1e72c06a523e9179f2325a NHSNO RXF VALID 2 A 007 20 51c5 2 2 0 1 19521 1126 1 10 24 21 222 2 7fe0f97fad130b4cb0f2582fad3a3fd2 64679614526ef9ac7bf460b5b09495ca NHSNO RXF VALID 2 A 008 88 8d1c 1 1 3 1 19536 1755 1 24 24 21 222 10 9c346f3bf96e86c5c11e4c97c8db930d 6cc798e3ad901302d5df814f3fe7334a NHSNO RXF VALID 1 R 002 73 1e3b 1 1 0 1 19523 1238 1 01 21 222 3 d1787cf846fb617e63153dbc7c901e87 12b080b9cc4ba31655ebfdd5c0ee3c27 NHSNO RXF VALID 2 J 011 67 2a39 2 4 0 1 19513 1449 1 24 21 222 8 4f0e6017b9c6da5650d3972a61681599 12b080b9cc4ba31655ebfdd5c0ee3c27 NHSNO RXF VALID 2 L 013 67 2a39 2 3 7 1 19516 1133 1 24 24 222 8 de5e1b9898373a106a4a16ad01cb25c3 50d356d07b3e92e1258e0b35e0cfcc09 NHSNO RP5 VALID 2 A 016 16 50a3 RP5DR 1 2 1 1 19188 1352 1 01 02 25 12 2 3f4118a4e4c224a7478e4b33bb3e6105 50d356d07b3e92e1258e0b35e0cfcc09 NHSNO RP5 VALID 2 A 016 16 50a3 RP5DR 1 2 8 1 19220 1110 1 02 99 2 cdea90793c93c0f14a37e72146351a22 52712651674edb68a015eedfeaa65f48 NHSNO RXF VALID 2 A 004 29 332b 2 3 1 1 19539 710 1 24 24 21 222 9 96a372e7c7f98c481da3fb70a6f22ce1 78240b57dcf604ff84263a80e7e15a2b NHSNO RXF VALID 2 A 009 27 5a1d 2 6 1 1 19533 1306 1 03 03 222 8 ;;;;
It can neither be INPUTed nor played with. Note that your data after field #18 diag_scheme looks like utter junk, and you don't have enough datalines fields to read the TREAT variables in. GIGO.
I understand what you mean and I did realise when I looked at the source data as most of these variables don't hold any values, and almost half of the population doesn't have any values for any of these variables but keeping them in is something I couldn't avoid as some analysis does involve reporting on observations that did not have any dignoses.
What do you think I should do? get rid of some of these variables altogether?
Thanks
First and foremost, make sure you input ALL the variables from your data source correctly, no matter whether they have missing values or not. If you don't, you have no data to rely upon. In the data sample you've shown, the TREAT variables are simply absent. Hence, you can't use them to make any programmatic decisions.
I thought I did bring in ALL variables correctly, please see the syntax and let me know if I have missed out on anything;
data S_AE; infile 'I:\CHCSheffield\20190401ACE-original\ACE_AE_20190401_rev.CSV' dsd truncover lrecl=32767 FIRSTOBS=2; length AEKEY $32 patient_id $32 link_method $18 procodet $3 postcode_flag $5 sex $1 ethnos $2 activage $32 ARRIVALAGE $32 pgpprac $4 sitetret $5 aearrivalmode 3 aeattenddisp 3 aerefsource 3 aedepttype 4 aearrivaldate 4 aearrivaltime 4 diag_scheme 3 diag_01-diag_12 $6 invest_01-invest_12 $4 treat_01-treat_12 $4 imd15decile_overall 3 ; informat aearrivaldate yymmdd10. ; input AEKEY patient_id link_method procodet postcode_flag sex ethnos activage ARRIVALAGE pgpprac sitetret aearrivalmode aeattenddisp aerefsource aedepttype aearrivaldate aearrivaltime diag_scheme diag_01-diag_12 invest_01-invest_12 treat_01-treat_12 imd15decile_overall ; run;
Attach you CSV file, then I'll see.
I couldn't attach as too large
Subset the file to just first 100 records. That should be more than enough.
Just apply a separate condition with the colon modifier for each spec condition:
data have ;
input (diag1-diag3) (:$6.) ;
cards ;
251 J441 450XY
J45 J451 J458
J469 J46 J46X
R062 J44.1 J441
J469 45XX J46X
262 263 262248
K52 P783 A00
A02 A03 A04
A05 A06 A07
A08 A09 A084
A080 A083 A059
251 251228 J441
J450 J45 J451
J458 J459 J46
J46X R062 P783
run ;
data want ;
set have ;
array dd diag: ;
do over dd ;
if dd in: ("26" "A0" ) then flag_gastro = 1 ;
if dd in: ("J44" "J45" "251") then flag_asthma = 1 ;
end ;
* if flag_gastro or flag_asthma ;
* drop flag_: ;
run ;
If you want only the output observations with either gastro or asthma codes found, uncomment the two penultimate lines.
Kind regards
Paul D.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.