I'd like to count first three digits of Qcodes by Facility name. I don't need other codes for now. Below shows my attempt to substract first three digits of Q codes then count by facility name.
proc import datafile="....\patient_nodups.csv"
out=have
dbms=csv replace;
getnames=yes;
run;
data qcodes; set Patients_NODUPS;
array qcodes {*} Other_Diagnosis_Code_1-Other_Diagnosis_Code_24;
do i = 1 to dim(qcodes);
if substr(left(qcodes),1,3){i} in ('Q:') then output qcodes;
end;
run;
Thanks for being such a guradian angel.
Sound like you really want something like the following:
proc import datafile="/folders/myfolders/patient_nodups.csv" out=have dbms=csv replace; getnames=yes; run; proc sql noprint; select catx(' ','length',name, '$6;') into :lengths separated by ' ' from dictionary.columns where libname=upcase('work') and memname=upcase('have') and substr(name,1,5) eq 'Other' ; quit; data qcodes (keep=facility_name qcode); &lengths.; set have; array qcodes{*} $ Other_Diagnosis_Code_24-Other_Diagnosis_Code_1; do i = 1 to dim(qcodes); if first(qcodes(i)) eq 'Q' then do; qcode=substr(qcodes(i),1,3); output; end; end; run; proc sort data=qcodes; by facility_name; run; proc freq data=qcodes; by facility_name; tables qcode/out=want (drop=percent); run; proc transpose data=want out=want (drop=_:); by facility_name; var count; id qcode; run;
Art, CEO, AnalystFinder.com
Not clear what you are trying to achieve. If you want to eliminate any of the other_diagnostic_codes that don't start with "Q", then the following will work:
proc import datafile="/folders/myfolders/patient_nodups.csv" out=have dbms=csv replace; getnames=yes; run; proc sql noprint; select catx(' ','length',name, '$6;') into :lengths separated by ' ' from dictionary.columns where libname=upcase('work') and memname=upcase('have') and substr(name,1,5) eq 'Other' ; quit; data qcodes; &lengths.; set have; array qcodes{*} $ Other_Diagnosis_Code_24-Other_Diagnosis_Code_1; do i = 1 to dim(qcodes); if first(qcodes(i)) ne 'Q' then call missing(qcodes(i)); end; call sortc(of qcodes(*)); run;
The reason the proc sql code is there is because proc import set different lengths for each of the other_diagnostic_code variables and the proc sql code standardizes them all to have a length of 6.
Art, CEO, AnalystFinder.com
Thank you very much.
The code:
options nolabel;
proc sql noprint;
select catx(' ','length',name, '$6;')
into :lengths separated by ' '
from dictionary.columns
where libname=upcase('work') and
memname=upcase('patients_nodups') and
substr(name,1,3) eq 'Other'
;
quit;
data qcodes;
&lengths.;
set patients_nodups;
array qcodes{*} $ Other_Diagnosis_Code_24-Other_Diagnosis_Code_1;
do i = 1 to dim(qcodes);
if first(qcodes(i)) ne 'Q' then call missing(qcodes(i));
end;
call sortc(of qcodes(*));
run;
worked fine with warning:
WARNING: Multiple lengths were specified for the variable Other_Diagnosis_Code_1 by input data
set(s). This can cause truncation of data.
thru
WARNING: Multiple lengths were specified for the variable Other_Diagnosis_Code_24 by input data
set(s). This can cause truncation of data.
I changed
substr(name,1,5) to substr(name,1,3)
in order to substract first three digits of q codes. However, I have 5 digit Q codes in the output data. The reason why is that I'd like to count number of distinct 3 digit qcodes by facility name for the next step. I apologize for ambiguity in my question. Hope image helps.
Am I missing something here?
I just tried below code wondering if I'm misplacinf end;. But no success yet.
data qcodes; set have;
array qcodes {*} Other_Diagnosis_Code_1-Other_Diagnosis_Code_24;
do i = 1 to dim(qcodes);
end;
if (substr(left(qcodes),1,3)) in ('Q:') then output qcodes;
run;
Log:
NOTE: Numeric values have been converted to character values at the places given by:
(Line):(Column).
1060:29
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.QCODES may be incomplete. When this step was stopped there were 0
observations and 36 variables.
WARNING: Data set WORK.QCODES was not replaced because this step was stopped.
If I've figured out what you are after, here's a way to get all the Q codes for each facility.
data qcodes;
array codes {24} $ 4 Other_diagnosis_code_1 - Other_diagnosis_code_24;
set have (keep=facility other_diagnosis_code_: );
do i=1 to 24;
if codes{i} =: 'Q' then do;
qcode = codes{i};
output;
end;
end;
keep facility qcode;
run;
proc freq data=want;
tables facility * qcode / missing list;
run;
Defining the array with length of $4 for each element ahead of time means you get Q plus the three digits that follow.
If I understand correctly, try this:
data qcodes;
set Patients_NODUPS;
array qcodes {*} Other_Diagnosis_Code_1-Other_Diagnosis_Code_24;
do i = 1 to dim(qcodes);
if substr(left(qcodes),1,1){i}='Q' then do;
output;
return;
end;
end;
run;
Sound like you really want something like the following:
proc import datafile="/folders/myfolders/patient_nodups.csv" out=have dbms=csv replace; getnames=yes; run; proc sql noprint; select catx(' ','length',name, '$6;') into :lengths separated by ' ' from dictionary.columns where libname=upcase('work') and memname=upcase('have') and substr(name,1,5) eq 'Other' ; quit; data qcodes (keep=facility_name qcode); &lengths.; set have; array qcodes{*} $ Other_Diagnosis_Code_24-Other_Diagnosis_Code_1; do i = 1 to dim(qcodes); if first(qcodes(i)) eq 'Q' then do; qcode=substr(qcodes(i),1,3); output; end; end; run; proc sort data=qcodes; by facility_name; run; proc freq data=qcodes; by facility_name; tables qcode/out=want (drop=percent); run; proc transpose data=want out=want (drop=_:); by facility_name; var count; id qcode; run;
Art, CEO, AnalystFinder.com
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.