Capturing data - Page 2

sks521 · Posted 09-03-2019 04:56 AM

I have run this array but getting an error message as follows; (these are all character variables listed as treat_01 to treat_12)

285  data CONTROL1 (drop = _:) ;
286    set VISITS ;
287    array dd TREAT: ;
ERROR: All variables in array list must be the same type, i.e., all numeric or character.
ERROR: All variables in array list must be the same type, i.e., all numeric or character.
ERROR: All variables in array list must be the same type, i.e., all numeric or character.
ERROR: All variables in array list must be the same type, i.e., all numeric or character.
ERROR: All variables in array list must be the same type, i.e., all numeric or character.
288    do over dd ;
289      if prxmatch ("m/J21|J0|J38|J37/oi", dd) then do ;
290    * if find (dd, "J21") or find (dd, "J0") or find (dd, "J38") or find (dd, "J37") then do ;
291        _exclude = 1 ;
292        leave ;
293      end ;
294    end ;
295    if not _exclude ;
296  run ;

NOTE: Numeric values have been converted to character values at the places given by:
      (Line):(Column).
      289:41
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.CONTROL1 may be incomplete.  When this step was stopped there were 0
         observations and 30 variables.
WARNING: Data set WORK.CONTROL1 was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

hashman · Posted 09-03-2019 06:38 AM

@sks521:

The SAS log tells you plainly that different TREAT* variables have both character and numeric data types. From the standpoint of process organization and the nature of the task, this is called "dirty data": They should be all character.

The proper solution is to convert all the numeric TREAT variables to character; then the code will run error-free.

A palliative solution is to use the CATX trick offered by @ChrisNZ, as unlike an array, the collective colon (:) modifier reference doesn't suggest the same data type across all TREAT: variables in the context of using it with the CATX function, since the latter converts numeric variables to the character type automatically:

data WANT ;                                                                     
  set HAVE ;                                                                                
  if prxmatch ("m/\b26|\bA0/oi", catx (' ', of TREAT:)) = 0
  and                                       
  prxmatch ("m/\bJ44|\bJ45|\b251/oi", catx (' ', of TREAT:)) = 0
  ;                                       
run ;

Kind regards

Paul D.

sks521 · Posted 09-03-2019 06:54 AM

Thanks but can I check in if this step will get rid of all observations with any diagnoses other than the ones specified (asthma and gastroenteritis) here please?

Thanks

S

sks521 · Posted 09-03-2019 10:20 AM

Hi Paul,

These are all character variables and I have defined them as character variables while importing data into SAS. How come I am getting errors.

And also I have tried to use the syntax you suggested but still getting errors, this time using diagnosis variables;

*Deleting observations with croup diagnosis;
*USING ARRAY FROM THE SAS SUPPORT COMMUNITY;
data CONTROL1 (drop = _:) ;                                                                     
  set VISITS ;                                                                                
  array dd diag: ;                                                                          
  do over dd ;                                                                              
    if prxmatch ("m/J05/oi", catx (' ', of diag:)) then do ;                                       
  * if find (dd, "J21") or find (dd, "J0") or find (dd, "J38") or find (dd, "J37") then do ;
      _exclude = 1 ;                                                                        
      leave ;                                                                               
    end ;                                                                                   
  end ;                                                                                     
  if not _exclude ;                                                                         
run ;

sks521 · Posted 09-03-2019 10:32 AM

I am pasting sample data for my data set for you play with;

input AEKEY:$32. patient_id:$32. link_method:$18. procodet:$3. postcode_flag:$5. sex:$1. ethnos:$2. activage:$32. ARRIVALAGE:$32. pgpprac:$4. sitetret:$5. aearrivalmode:32. aeattenddisp:32. aerefsource:32. aedepttype:32. aearrivaldate:32. aearrivaltime:32.
 diag_scheme:32. diag_01:$6. diag_02:$6. diag_03:$6. diag_04:$6. diag_05:$6. diag_06:$6. diag_07:$6. diag_08:$6. diag_09:$6. diag_10:$6. diag_11:$6. diag_12:$6. invest_01:$4. invest_02:$4. invest_03:$4. invest_04:$4. invest_05:$4. invest_06:$4. invest_07:$
4. invest_08:$4. invest_09:$4. invest_10:$4. invest_11:$4. invest_12:$4. treat_01:$4. treat_02:$4. treat_03:$4. treat_04:$4. treat_05:$4. treat_06:$4. treat_07:$4. treat_08:$4. treat_09:$4. treat_10:$4. treat_11:$4. treat_12:$4. imd15decile_overall:32.;
datalines;
e81e201e78d19d8c3f5122ce7d96c25a 97e023526b0dc2b5e9e08eed37775ab8 NHSNO RXF VALID 1 A 002 73 da48   2 3 1 1 19518 1014 1                         24 24                     34 41                     3
5221b6afed59cee127663fe106688fd0 b76acf7d6e16213737184da41a1794a1 NHSNO RXF VALID 2 J 006 82 37d3   2 3 1 1 19537 1359 1                         01 24                     221 222                     3
c5f9d619392b7317bff4355d7e0c3829 8e19bf9b06f32ea3538b417a69254e3f NHSNO RCB VALID 1 A   16 b6a4 RCB55 2 3 1 1 19165 1602 1                         24                       221 011                     3
42bdbdc79100370c0f7b92e8219c950f 2a44ce4db8a93eb5a111d0900a818351 NHSNO RWY VALID 2 A 016 16 6868 RWY01 2 3 1 1 19104 231 1 05 15L                       01 01                     511                       9
b43c9c6090a4a0eb69b5bd88f8b76e8f 1e46f877744f9cceec87405da4716053 NHSNO RXF VALID 2 A 015 21 b9e9   2 1 1 1 19526 1423 1                         24                       21 511                     4
dd8c32086cfd590049fdeb9817083967 f21138c84b7ea596bb1aeb4a7cbb1e32 NHSNO RWA VALID 1 A 016 16 f6b0 RWA00 2 2 1 1 19051 1123 1 03 25L                       03 24                     291 43 21 12                 3
c810915d6201106c5e9caec870183a21 23f76559afb9135cb92fbcf9072d9957 FULL ID NHSNO LINK RJL VALID 1 A 016 16 V819 RJL32 2 3 1 1 19013 1601 1                         24                       222 21 236 042                 8
4c8808798cb8278f03540122edb7f1b1 1152df925fcc5d927f5e811485338fc0 NHSNO RP5 VALID 2 A 016 16 8e29 RP5DR 2 2 8 1 19094 2115 1 38                       24                       99                       5
4ee99a96e5cdee139f6c7b4a3ec31734 1152df925fcc5d927f5e811485338fc0 NHSNO RP5 VALID 2 A 016 16 8e29 RP5DR 2 5 1 1 19221 2029 1                         01                       222 51                     5
ab1c30f4ae71acebfc5dc9cff812ae53 3ab4b9ec7420ffa8988cfa496116359c NHSNO RCD VALID 2 A 016 16 eade RCD00 9 3 8 1 19090 1933 1                         01                       222                       5
002c5a7a483c7f9a281393e9d5886edd 36c1b661c6472d7c520013558fab9993 NHSNO RXF VALID 1 A 012 41 e7c4   2 12 1 4 19512 952 1 38                       24 24                     27                       2
dcc81124ce35647223c9d78c59bfa4c9 0bb5d959bd1e72c06a523e9179f2325a NHSNO RXF VALID 2 A 007 20 51c5   2 2 0 1 19521 1126 1                         10 24                     21 222                     2
7fe0f97fad130b4cb0f2582fad3a3fd2 64679614526ef9ac7bf460b5b09495ca NHSNO RXF VALID 2 A 008 88 8d1c   1 1 3 1 19536 1755 1                         24 24                     21 222                     10
9c346f3bf96e86c5c11e4c97c8db930d 6cc798e3ad901302d5df814f3fe7334a NHSNO RXF VALID 1 R 002 73 1e3b   1 1 0 1 19523 1238 1                         01                       21 222                     3
d1787cf846fb617e63153dbc7c901e87 12b080b9cc4ba31655ebfdd5c0ee3c27 NHSNO RXF VALID 2 J 011 67 2a39   2 4 0 1 19513 1449 1                         24                       21 222                     8
4f0e6017b9c6da5650d3972a61681599 12b080b9cc4ba31655ebfdd5c0ee3c27 NHSNO RXF VALID 2 L 013 67 2a39   2 3 7 1 19516 1133 1                         24 24                     222                       8
de5e1b9898373a106a4a16ad01cb25c3 50d356d07b3e92e1258e0b35e0cfcc09 NHSNO RP5 VALID 2 A 016 16 50a3 RP5DR 1 2 1 1 19188 1352 1                         01 02                     25 12                     2
3f4118a4e4c224a7478e4b33bb3e6105 50d356d07b3e92e1258e0b35e0cfcc09 NHSNO RP5 VALID 2 A 016 16 50a3 RP5DR 1 2 8 1 19220 1110 1                         02                       99                       2
cdea90793c93c0f14a37e72146351a22 52712651674edb68a015eedfeaa65f48 NHSNO RXF VALID 2 A 004 29 332b   2 3 1 1 19539 710 1                         24 24                     21 222                     9
96a372e7c7f98c481da3fb70a6f22ce1 78240b57dcf604ff84263a80e7e15a2b NHSNO RXF VALID 2 A 009 27 5a1d   2 6 1 1 19533 1306 1                         03 03                     222                       8
;;;;

hashman · Posted 09-03-2019 03:31 PM

@sks521;

It can neither be INPUTed nor played with. Note that your data after field #18 diag_scheme looks like utter junk, and you don't have enough datalines fields to read the TREAT variables in. GIGO.

sks521 · Posted 09-04-2019 04:49 AM

I understand what you mean and I did realise when I looked at the source data as most of these variables don't hold any values, and almost half of the population doesn't have any values for any of these variables but keeping them in is something I couldn't avoid as some analysis does involve reporting on observations that did not have any dignoses.

What do you think I should do? get rid of some of these variables altogether?

Thanks

hashman · Posted 09-04-2019 06:09 AM

@sks521;

First and foremost, make sure you input ALL the variables from your data source correctly, no matter whether they have missing values or not. If you don't, you have no data to rely upon. In the data sample you've shown, the TREAT variables are simply absent. Hence, you can't use them to make any programmatic decisions.

sks521 · Posted 09-04-2019 06:34 AM

I thought I did bring in ALL variables correctly, please see the syntax and let me know if I have missed out on anything;

data S_AE;
infile 'I:\CHCSheffield\20190401ACE-original\ACE_AE_20190401_rev.CSV' dsd truncover lrecl=32767 FIRSTOBS=2;
length AEKEY  $32  patient_id $32  link_method $18 procodet $3 postcode_flag $5 sex $1 ethnos $2  activage $32 ARRIVALAGE $32 pgpprac $4 sitetret $5 aearrivalmode 3 
aeattenddisp 3  aerefsource 3  aedepttype 4  aearrivaldate 4  aearrivaltime 4 diag_scheme 3 diag_01-diag_12 $6  invest_01-invest_12 $4 treat_01-treat_12 $4
  imd15decile_overall 3
;
informat
  aearrivaldate
    yymmdd10.
;
input
 AEKEY   patient_id   link_method  procodet  postcode_flag  sex  ethnos   activage  ARRIVALAGE  pgpprac  sitetret  aearrivalmode  
aeattenddisp   aerefsource   aedepttype   aearrivaldate   aearrivaltime  diag_scheme  diag_01-diag_12   invest_01-invest_12 treat_01-treat_12 
  imd15decile_overall 
;
run;

hashman · Posted 09-04-2019 06:48 AM

@sks521;

Attach you CSV file, then I'll see.

sks521 · Posted 09-04-2019 07:00 AM

I couldn't attach as too large

The attachment ace_ae_20190401_rev.csv is too large and has been removed. The maximum file size is: 5,242,880 bytes.

hashman · Posted 09-04-2019 07:23 AM

@sks521;

Subset the file to just first 100 records. That should be more than enough.

hashman · Posted 09-02-2019 04:10 PM

@sks521:

Just apply a separate condition with the colon modifier for each spec condition:

data have ;                                             
  input (diag1-diag3) (:$6.) ;                          
  cards ;                                               
251  J441   450XY                                       
J45  J451   J458                                        
J469 J46    J46X                                        
R062 J44.1  J441                                        
J469 45XX   J46X                                        
262  263    262248                                      
K52  P783   A00                                         
A02  A03    A04                                         
A05  A06    A07                                         
A08  A09    A084                                        
A080 A083   A059                                        
251  251228 J441                                        
J450 J45    J451                                        
J458 J459   J46                                         
J46X R062   P783                                        
run ;                                                   
                                                        
data want ;                                             
  set have ;                                            
  array dd diag: ;                                      
  do over dd ;                                          
    if dd in: ("26"  "A0"       ) then flag_gastro = 1 ;
    if dd in: ("J44" "J45" "251") then flag_asthma = 1 ;
  end ;                                                 
* if flag_gastro or flag_asthma ;                       
* drop flag_: ;                                         
run ;

If you want only the output observations with either gastro or asthma codes found, uncomment the two penultimate lines.

Kind regards

Paul D.

Re: Capturing data

Re: Capturing data

Re: Capturing data

Re: Capturing data

Re: Capturing data

Re: Capturing data

Re: Capturing data

Re: Capturing data

Re: Capturing data

Re: Capturing data

Re: Capturing data

Re: Capturing data

Re: Capturing data

SAS Innovate 2026 Registration is Open

SAS Training: Just a Click Away