Hi,
I have more than 200 tables like the sample table below and need to generate a final table with the frequency in percentage of records (7 in the sample table). The simple sample table is like
data temp; input activity $ location $ item $; datalines; A IC V A ID R B IR W C IC R A IR W C IC V B ID V ; RUN;
What I want is like
I use proc freq to get the percentage, but I am struggling to put them all into one final big table with the Filename variable as the dataset name.
Thank you in advance for your suggestion.
How about this example? I only use two data sets, temp1 and temp2, and in fact the data and percents are identical, but the point is to illustrate the method.
data temp1;
input activity $ location $ item $;
datalines;
A IC V
A ID R
B IR W
C IC R
A IR W
C IC V
B ID V
;
RUN;
data temp2;
input activity $ location $ item $;
datalines;
A IC V
A ID R
B IR W
C IC R
A IR W
C IC V
B ID V
;
RUN;
data all;
/* Note: if you really have 200 data sets, just use temp1-temp200 below */
/* or whatever the real names are */
set temp1-temp2 indsname=indsname;
file=propcase(scan(indsname,2,'.'));
run;
proc freq data=all;
by file;
tables activity/out=act;
tables location/out=loc;
tables item/out=item;
run;
data all2;
length characteristic activity location item $30;
set act loc item;
if not missing(activity) then activity=catx('~',"Activity",activity);
if not missing(location) then location=catx('~','Location',location);
if not missing(item) then item=catx('~','Item',item);
characteristic=coalescec(activity,location,item);
run;
proc report data=all2 split='~';
columns ("File~Name" file) ("Percent" percent),characteristic;
define file/group ' ';
define characteristic/across ' ';
define percent/sum ' ' format=8.2;
run;
Please note that it is always a good idea to make the table easily readable by your intended audience. This includes using capital letters and English words. Using per_ic, which is not an English word and is not capitalized, is not as clear and readable as Percent Location IC which uses English words and has capitalized words. Note also File Name has capitalized file names and English words (filename is not really an English word). Furthermore, if R V W for Items have actual names or values, these should appear in the table as well. Activity should not be A B C but rather the actual English word names that A B C represent. And so on. Make the table readable!!
How about this example? I only use two data sets, temp1 and temp2, and in fact the data and percents are identical, but the point is to illustrate the method.
data temp1;
input activity $ location $ item $;
datalines;
A IC V
A ID R
B IR W
C IC R
A IR W
C IC V
B ID V
;
RUN;
data temp2;
input activity $ location $ item $;
datalines;
A IC V
A ID R
B IR W
C IC R
A IR W
C IC V
B ID V
;
RUN;
data all;
/* Note: if you really have 200 data sets, just use temp1-temp200 below */
/* or whatever the real names are */
set temp1-temp2 indsname=indsname;
file=propcase(scan(indsname,2,'.'));
run;
proc freq data=all;
by file;
tables activity/out=act;
tables location/out=loc;
tables item/out=item;
run;
data all2;
length characteristic activity location item $30;
set act loc item;
if not missing(activity) then activity=catx('~',"Activity",activity);
if not missing(location) then location=catx('~','Location',location);
if not missing(item) then item=catx('~','Item',item);
characteristic=coalescec(activity,location,item);
run;
proc report data=all2 split='~';
columns ("File~Name" file) ("Percent" percent),characteristic;
define file/group ' ';
define characteristic/across ' ';
define percent/sum ' ' format=8.2;
run;
Please note that it is always a good idea to make the table easily readable by your intended audience. This includes using capital letters and English words. Using per_ic, which is not an English word and is not capitalized, is not as clear and readable as Percent Location IC which uses English words and has capitalized words. Note also File Name has capitalized file names and English words (filename is not really an English word). Furthermore, if R V W for Items have actual names or values, these should appear in the table as well. Activity should not be A B C but rather the actual English word names that A B C represent. And so on. Make the table readable!!
@PaigeMiller Thank you for the solution. It works very neatly. There is an error to request that I need to do proc sort by file before proceeding to the step of proc freq. The order in the final table is like temp1, temp10.........temp2, temp 20....... Any suggestions to make the order like temp 1, temp2, temp3......? Thank you so much.
Try this one :
proc sort data=all out=want sortseq=linguistic(numeric_collation=on); by file; run;
@CHL0320 wrote:
@PaigeMiller Thank you for the solution. It works very neatly. There is an error to request that I need to do proc sort by file before proceeding to the step of proc freq. The order in the final table is like temp1, temp10.........temp2, temp 20....... Any suggestions to make the order like temp 1, temp2, temp3......? Thank you so much.
This raises another question. Is the intended audience for your table output familiar with what is in data set temp1 and temp2 and ... and temp200? Maybe, but I think this is unlikely. Meaningful descriptions of these data sets ought to be used in the table instead of file1 file2 etc. (The goal isn't to get the programming to work; the goal is to create a table that people will understand, and programming is a part of the steps needed to obtain that table)
data temp1;
input activity $ location $ item $;
datalines;
A IC V
A ID R
B IR W
C IC V
A IR W
C IC V
B ID V
;
RUN;
data temp2;
input activity $ location $ item $;
datalines;
A IC V
A ID R
B IR W
C IC R
A IR W
A IC W
B ID V
;
RUN;
data temp3;
input activity $ location $ item $;
datalines;
A IC R
B ID R
B IR W
C IC R
A IR W
C IC V
B ID V
;
RUN;
proc sql noprint;
select compbl(memname || compress(' (in=ds_' || memname || ')')) into :tlist separated by ' '
from dictionary.tables where libname='WORK' and upcase(memname) like 'TEMP%';
quit;
%put confirm this looks right: &tlist;
data all (rename=(lastfn=fname));
set &tlist end=last;
length lastfn fname $32;
array T {*} ds_:;
do i=1 to dim(T);
if T[i] then do;
fname=substr(vname(T[i]),4);
leave;
end;
end;
lastfn=lag(fname);
length per_A per_B per_C per_IC per_ID per_IR per_V per_R per_W 8;
format per_: percent8.2;
array p1 {*} per_A per_B per_C;
array p2 {*} per_IC per_ID per_IR;
array p3 {*} per_V per_R per_W;
if fname^=lastfn then do;
if _N_>1 then do;
do i=1 to dim(p1);
p1[i]=p1[i]/nrecs;
end;
do i=1 to dim(p2);
p2[i]=p2[i]/nrecs;
end;
do i=1 to dim(p3);
p3[i]=p3[i]/nrecs;
end;
output;
end;
call missing(of p1[*], of p2[*], of p3[*]);
nrecs=0;
end;
nrecs+1;
do i=1 to dim(p1);
p1[i]+(activity=scan(vname(p1[i]),2,'_'));
end;
do i=1 to dim(p2);
p2[i]+(location=scan(vname(p2[i]),2,'_'));
end;
do i=1 to dim(p3);
p3[i]+(item=scan(vname(p3[i]),2,'_'));
end;
if last then do;
lastfn=fname;
do i=1 to dim(p1);
p1[i]=p1[i]/nrecs;
end;
do i=1 to dim(p2);
p2[i]=p2[i]/nrecs;
end;
do i=1 to dim(p3);
p3[i]=p3[i]/nrecs;
end;
output;
end;
keep lastfn per_:;
run;
proc print data=all heading=v width=min; run;
run;
Thank you. The values in the final table are the percentage of observation. The value is (the count of the original value in the column/total record number)*100.
NRECS is a variable that is not defined before you use it in this statement
p1[i]=p1[i]/nrecs;
The original question said there are over 200 data sets, how do you up-scale this code to that case? My code handles this case easily, as the comment in my code indicates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.