This is probably simple, but I'm having a hard time knowing the best strategy. Say I have a data set like this:
ID Persontype Med Weap
001 Staff No No
001 Staff No No
001 Staff Yes Yes
002 Staff No No
003 Staff No Yes
003 Other No No
003 Volunteer Yes No
003 Staff No No
003 Staff Yes Yes
I'm trying to build a new file that has one record per ID (there can be any number of rows with the same ID) that will put the value of the rows as the new column with a number
to distinguish, e.g.,
ID Persontype1 Med1 Weap1 Persontype2 Med2 Weap2 Persontype3 Med3 Weap3 Persontype4 Med4 Weap4 Persontype5 Med5 Weap5
001 Staff No No Staff No No Staff Yes Yes
002 Staff No No
003 Staff No Yes Other No No Volunteer Yes No Staff No No Staff Yes Yes
Any assistance would be appreciated!
There are a number of methods to accomplish what you want. I, personally, like the following one:
data have;
input (ID Persontype Med Weap) ($);
cards;
001 Staff No No
001 Staff No No
001 Staff Yes Yes
002 Staff No No
003 Staff No Yes
003 Other No No
003 Volunteer Yes No
003 Staff No No
003 Staff Yes Yes
;
proc sql noprint;
select max(obs) into :obs
from ( select count(*) as obs
from have
group by id )
;
quit;
proc summary nway data=have missing;
class ID;
output out = want(drop=_type_ _freq_)
idgroup(out[&obs](Persontype Med Weap)=);
run;
The arrangement you have has better qualities but you asked.
As long as you have 100 or less obs per BY GROUP level you can use PROC SUMMARY to transpose character and numeric variables in one step.
data have;
input (ID Persontype Med Weap)($);
cards;
001 Staff No No
001 Staff No No
001 Staff Yes Yes
002 Staff No No
003 Staff No Yes
003 Other No No
003 Volunteer Yes No
003 Staff No No
003 Staff Yes Yes
;;;;
run;
proc sql noprint;
select max(obs) into: obs
from (select count(*) as obs from have group by id);
quit;
run;
%put NOTE: OBS=&obs;
proc summary data=have;
by id;
output out=transposed(drop=_type_)
idgroup(out[&obs](Persontype--Weap)=)
;
run;
proc print;
run;
That worked great, and I never would have thought of it! I may nix the idea, as the obs ended up being 63, which means way too many null columns, but I'm glad to know how to do it, and can apply it in other places.
If you would like to read a bit more about the PROC SUMMARY approach, take a look at ...
Transposing Data Using PROC SUMMARY'S IDGROUP Option
Thank you - I will share with my coworkers!
FWIW: I "stole" the concept from Mike and Data_Null as they were really the originators and I had meant to include that reference but forgot.
I love call execute:
data have; input (ID Persontype Med Weap) ($); cards; 001 Staff No No 001 Staff No No 001 Staff Yes Yes 002 Staff No No 003 Staff No Yes 003 Other No No 003 Volunteer Yes No 003 Staff No No 003 Staff Yes Yes ; run; data _null_; set have end=last; by id notsorted; if _n_ eq 1 then call execute('data want;'); if first.id then do;count=0; call execute('id="'||id||'";');end; count+1; call execute('persontype'||strip(count)||'="'||persontype||'";'); call execute('med'||strip(count)||'="'||med||'";'); call execute('weap'||strip(count)||'="'||weap||'";'); if last.id then call execute('output;'); if last then call execute('run;'); run;
Ksharp
Hi ... though I'd still use PROC SUMMARY to do this, another
approach that writes/executes SAS code is FILE/PUT/%INCLUDE
(no parentheses or concatenations).
Just curious, maybe someone can offer an opinion ...
since CALL EXECUTE is just being used to create a data step for
subsequent execution, is there any advantage to CALL EXECUTE
over FILE/PUT/%INCLUDE in this situation?
data have;
input (ID Persontype Med Weap) (:$3. :$9. 2*:$3.);
cards;
001 Staff No No
001 Staff No No
001 Staff Yes Yes
002 Staff No No
003 Staff No Yes
003 Other No No
003 Volunteer Yes No
003 Staff No No
003 Staff Yes Yes
;
run;
filename xxx temp;
data _null_;
file xxx;
put 'data want;' ;
do until(last);
do count=1 by 1 until (last.id);
set have end=last;
by id;
if first.id then put 'id="' id $3. '";' ;
put 'persontype' count '="' persontype $9. '";' /
'med' count '="' med $3. '";' /
'weap' count '="' weap $3. '";' ;
end;
put 'output;' ;
end;
put 'run;' ;
stop;
run;
%include xxx;
To me the advantages to FILE/PUT/%INC are
I see no advantage to CALL EXECUTE over FILE/PUT/INC. CALL EXECUTE does have it's place but for me after more than a few lines of code it becomes onerous.
Your example program converts all the data to character strings and then "reads" it back in just as the CALL EXECUTE example from ksharp. That's fine with all character or integer data but when you have floating point data you will have a data loss problem. My modification of your program address the numeric data
problem in a very simplistic way. For a complete treatment see this paper http://support.sas.com/resources/papers/proceedings11/064-2011.pdfhttp://support.http://support.sas.com/resources/papers/proceedings11/064-2011.pdfsas.com/resources/papers/proceedings11/064-2011.pdf
data have;
input (ID Persontype Med Weap) (:$3. :$12. 2*:$3.);
y = rannor(10);
z = ranuni(10);
cards;
001 Staff No No
001 Staff No No
001 Staff Yes Yes
002 Staff No No
003 Staff No Yes
003 "Other" No No
003 Volunteer's Yes No
003 Staff No No
003 Staff Yes Yes
;
run;
proc print;
run;
filename FT33F001 temp;
data _null_;
file FT33F001;
put 'data want;';
do until(last);
do count=1 by 1 until (last.id);
set have end=last;
by id;
if first.id then put +3 'do id=' id $quote5. ';';
array tv
do i = 1 to dim(tv);
name = vname(tv);
value = quote(tv);
put +6 name $10.-r count '= ' value ';';
end;
array tn
do i = 1 to dim(tn);
name = vname(tn);
value = quote(put(tn,binary64.));
put +6 name $10.-r count '= input(' value ',binary64.);';
end;
end;
put +6 'output;';
put +6 'end;';
end;
put +3 'run;';
stop;
run;
%include FT33F001 / source2;
proc print;
run;
Hi. Mike.
there are no any advantage to CALL EXECUTEover FILE/PUT/%INCLUDE,
Just be more convenient. Although%include is faster than call execute for execution.
But I still like call execute which not need to make a file and to execute it by %include,
because it is very convenient. You just need to run one time and then get the result.
Ksharp
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.