I'd like to count first three digits of Qcodes by Facility name. I don't need other codes for now. Below shows my attempt to substract first three digits of Q codes then count by facility name.
proc import datafile="....\patient_nodups.csv"
out=have
dbms=csv replace;
getnames=yes;
run;
data qcodes; set Patients_NODUPS;
array qcodes {*} Other_Diagnosis_Code_1-Other_Diagnosis_Code_24;
do i = 1 to dim(qcodes);
if substr(left(qcodes),1,3){i} in ('Q:') then output qcodes;
end;
run;
Thanks for being such a guradian angel.
Sound like you really want something like the following:
proc import datafile="/folders/myfolders/patient_nodups.csv" out=have dbms=csv replace; getnames=yes; run; proc sql noprint; select catx(' ','length',name, '$6;') into :lengths separated by ' ' from dictionary.columns where libname=upcase('work') and memname=upcase('have') and substr(name,1,5) eq 'Other' ; quit; data qcodes (keep=facility_name qcode); &lengths.; set have; array qcodes{*} $ Other_Diagnosis_Code_24-Other_Diagnosis_Code_1; do i = 1 to dim(qcodes); if first(qcodes(i)) eq 'Q' then do; qcode=substr(qcodes(i),1,3); output; end; end; run; proc sort data=qcodes; by facility_name; run; proc freq data=qcodes; by facility_name; tables qcode/out=want (drop=percent); run; proc transpose data=want out=want (drop=_:); by facility_name; var count; id qcode; run;
Art, CEO, AnalystFinder.com
Not clear what you are trying to achieve. If you want to eliminate any of the other_diagnostic_codes that don't start with "Q", then the following will work:
proc import datafile="/folders/myfolders/patient_nodups.csv" out=have dbms=csv replace; getnames=yes; run; proc sql noprint; select catx(' ','length',name, '$6;') into :lengths separated by ' ' from dictionary.columns where libname=upcase('work') and memname=upcase('have') and substr(name,1,5) eq 'Other' ; quit; data qcodes; &lengths.; set have; array qcodes{*} $ Other_Diagnosis_Code_24-Other_Diagnosis_Code_1; do i = 1 to dim(qcodes); if first(qcodes(i)) ne 'Q' then call missing(qcodes(i)); end; call sortc(of qcodes(*)); run;
The reason the proc sql code is there is because proc import set different lengths for each of the other_diagnostic_code variables and the proc sql code standardizes them all to have a length of 6.
Art, CEO, AnalystFinder.com
Thank you very much.
The code:
options nolabel;
proc sql noprint;
select catx(' ','length',name, '$6;')
into :lengths separated by ' '
from dictionary.columns
where libname=upcase('work') and
memname=upcase('patients_nodups') and
substr(name,1,3) eq 'Other'
;
quit;
data qcodes;
&lengths.;
set patients_nodups;
array qcodes{*} $ Other_Diagnosis_Code_24-Other_Diagnosis_Code_1;
do i = 1 to dim(qcodes);
if first(qcodes(i)) ne 'Q' then call missing(qcodes(i));
end;
call sortc(of qcodes(*));
run;
worked fine with warning:
WARNING: Multiple lengths were specified for the variable Other_Diagnosis_Code_1 by input data
set(s). This can cause truncation of data.
thru
WARNING: Multiple lengths were specified for the variable Other_Diagnosis_Code_24 by input data
set(s). This can cause truncation of data.
I changed
substr(name,1,5) to substr(name,1,3)
in order to substract first three digits of q codes. However, I have 5 digit Q codes in the output data. The reason why is that I'd like to count number of distinct 3 digit qcodes by facility name for the next step. I apologize for ambiguity in my question. Hope image helps.
Am I missing something here?
I just tried below code wondering if I'm misplacinf end;. But no success yet.
data qcodes; set have;
array qcodes {*} Other_Diagnosis_Code_1-Other_Diagnosis_Code_24;
do i = 1 to dim(qcodes);
end;
if (substr(left(qcodes),1,3)) in ('Q:') then output qcodes;
run;
Log:
NOTE: Numeric values have been converted to character values at the places given by:
(Line):(Column).
1060:29
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.QCODES may be incomplete. When this step was stopped there were 0
observations and 36 variables.
WARNING: Data set WORK.QCODES was not replaced because this step was stopped.
If I've figured out what you are after, here's a way to get all the Q codes for each facility.
data qcodes;
array codes {24} $ 4 Other_diagnosis_code_1 - Other_diagnosis_code_24;
set have (keep=facility other_diagnosis_code_: );
do i=1 to 24;
if codes{i} =: 'Q' then do;
qcode = codes{i};
output;
end;
end;
keep facility qcode;
run;
proc freq data=want;
tables facility * qcode / missing list;
run;
Defining the array with length of $4 for each element ahead of time means you get Q plus the three digits that follow.
If I understand correctly, try this:
data qcodes;
set Patients_NODUPS;
array qcodes {*} Other_Diagnosis_Code_1-Other_Diagnosis_Code_24;
do i = 1 to dim(qcodes);
if substr(left(qcodes),1,1){i}='Q' then do;
output;
return;
end;
end;
run;
Sound like you really want something like the following:
proc import datafile="/folders/myfolders/patient_nodups.csv" out=have dbms=csv replace; getnames=yes; run; proc sql noprint; select catx(' ','length',name, '$6;') into :lengths separated by ' ' from dictionary.columns where libname=upcase('work') and memname=upcase('have') and substr(name,1,5) eq 'Other' ; quit; data qcodes (keep=facility_name qcode); &lengths.; set have; array qcodes{*} $ Other_Diagnosis_Code_24-Other_Diagnosis_Code_1; do i = 1 to dim(qcodes); if first(qcodes(i)) eq 'Q' then do; qcode=substr(qcodes(i),1,3); output; end; end; run; proc sort data=qcodes; by facility_name; run; proc freq data=qcodes; by facility_name; tables qcode/out=want (drop=percent); run; proc transpose data=want out=want (drop=_:); by facility_name; var count; id qcode; run;
Art, CEO, AnalystFinder.com
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.