BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
CHL0320
Obsidian | Level 7

Hi, 

I have more than 200 tables like the sample table below and need to generate a final table with the frequency in percentage of records (7 in the sample table). The simple sample table is like 

data temp;
input activity $ location $ item $;
datalines;
A IC V
A ID R
B IR W
C IC R
A IR W
C IC V
B ID V
;
RUN;

What I want is like 

CHL0320_0-1744118027379.png

 

I use proc freq to get the percentage, but I am struggling to put them all into one final big table with the Filename variable as the dataset name. 

Thank you in advance for your suggestion.

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

How about this example? I only use two data sets, temp1 and temp2, and in fact the data and percents are identical, but the point is to illustrate the method.

 

 

data temp1;
input activity $ location $ item $;
datalines;
A IC V
A ID R
B IR W
C IC R
A IR W
C IC V
B ID V
;
RUN;

data temp2;
input activity $ location $ item $;
datalines;
A IC V
A ID R
B IR W
C IC R
A IR W
C IC V
B ID V
;
RUN;

data all;
    /* Note: if you really have 200 data sets, just use temp1-temp200 below */
    /* or whatever the real names are */
    set temp1-temp2 indsname=indsname;
    file=propcase(scan(indsname,2,'.'));
run;
proc freq data=all;
    by file;
    tables activity/out=act;
    tables location/out=loc;
    tables item/out=item;
run;
data all2;
    length characteristic activity location item $30;
    set act loc item;
    if not missing(activity) then activity=catx('~',"Activity",activity);
    if not missing(location) then location=catx('~','Location',location);
    if not missing(item) then item=catx('~','Item',item);
    characteristic=coalescec(activity,location,item);
run;

proc report data=all2 split='~';
    columns ("File~Name" file) ("Percent" percent),characteristic;
    define file/group ' ';
    define characteristic/across ' ';
    define percent/sum ' ' format=8.2;
run;

 

 

PaigeMiller_0-1744119969583.png

 

Please note that it is always a good idea to make the table easily readable by your intended audience. This includes using capital letters and English words. Using per_ic, which is not an English word and is not capitalized, is not as clear and readable as Percent Location IC which uses English words and has capitalized words. Note also File Name has capitalized file names and English words (filename is not really an English word). Furthermore, if R V W for Items have actual names or values, these should appear in the table as well. Activity should not be A B C but rather the actual English word names that A B C represent. And so on. Make the table readable!!

 

--
Paige Miller

View solution in original post

10 REPLIES 10
PaigeMiller
Diamond | Level 26

How about this example? I only use two data sets, temp1 and temp2, and in fact the data and percents are identical, but the point is to illustrate the method.

 

 

data temp1;
input activity $ location $ item $;
datalines;
A IC V
A ID R
B IR W
C IC R
A IR W
C IC V
B ID V
;
RUN;

data temp2;
input activity $ location $ item $;
datalines;
A IC V
A ID R
B IR W
C IC R
A IR W
C IC V
B ID V
;
RUN;

data all;
    /* Note: if you really have 200 data sets, just use temp1-temp200 below */
    /* or whatever the real names are */
    set temp1-temp2 indsname=indsname;
    file=propcase(scan(indsname,2,'.'));
run;
proc freq data=all;
    by file;
    tables activity/out=act;
    tables location/out=loc;
    tables item/out=item;
run;
data all2;
    length characteristic activity location item $30;
    set act loc item;
    if not missing(activity) then activity=catx('~',"Activity",activity);
    if not missing(location) then location=catx('~','Location',location);
    if not missing(item) then item=catx('~','Item',item);
    characteristic=coalescec(activity,location,item);
run;

proc report data=all2 split='~';
    columns ("File~Name" file) ("Percent" percent),characteristic;
    define file/group ' ';
    define characteristic/across ' ';
    define percent/sum ' ' format=8.2;
run;

 

 

PaigeMiller_0-1744119969583.png

 

Please note that it is always a good idea to make the table easily readable by your intended audience. This includes using capital letters and English words. Using per_ic, which is not an English word and is not capitalized, is not as clear and readable as Percent Location IC which uses English words and has capitalized words. Note also File Name has capitalized file names and English words (filename is not really an English word). Furthermore, if R V W for Items have actual names or values, these should appear in the table as well. Activity should not be A B C but rather the actual English word names that A B C represent. And so on. Make the table readable!!

 

--
Paige Miller
CHL0320
Obsidian | Level 7

@PaigeMiller Thank you for the solution. It works very neatly.  There is an error to request that  I need to do proc sort by file before proceeding to the step of proc freq. The order in the final table is like temp1, temp10.........temp2, temp 20....... Any suggestions to make the order like temp 1, temp2, temp3......? Thank you so much.

Ksharp
Super User

Try this one :

 

proc sort data=all out=want sortseq=linguistic(numeric_collation=on);
by file;
run;
PaigeMiller
Diamond | Level 26

@CHL0320 wrote:

@PaigeMiller Thank you for the solution. It works very neatly.  There is an error to request that  I need to do proc sort by file before proceeding to the step of proc freq. The order in the final table is like temp1, temp10.........temp2, temp 20....... Any suggestions to make the order like temp 1, temp2, temp3......? Thank you so much.


This raises another question. Is the intended audience for your table output familiar with what is in data set temp1 and temp2 and ... and temp200? Maybe, but I think this is unlikely. Meaningful descriptions of these data sets ought to be used in the table instead of file1 file2 etc. (The goal isn't to get the programming to work; the goal is to create a table that people will understand, and programming is a part of the steps needed to obtain that table)

--
Paige Miller
quickbluefish
Barite | Level 11
I have somehow never seen the INDSNAME option before - very useful!
quickbluefish
Barite | Level 11

data temp1;
input activity $ location $ item $;
datalines;
A IC V
A ID R
B IR W
C IC V
A IR W
C IC V
B ID V
;
RUN;

data temp2;
input activity $ location $ item $;
datalines;
A IC V
A ID R
B IR W
C IC R
A IR W
A IC W
B ID V
;
RUN;

data temp3;
input activity $ location $ item $;
datalines;
A IC R
B ID R
B IR W
C IC R
A IR W
C IC V
B ID V
;
RUN;

proc sql noprint;
select compbl(memname || compress(' (in=ds_' || memname || ')')) into :tlist separated by ' '
from dictionary.tables where libname='WORK' and upcase(memname) like 'TEMP%';
quit;

%put confirm this looks right: &tlist;

data all (rename=(lastfn=fname));
set &tlist end=last;
length lastfn fname $32;
array T {*} ds_:;
do i=1 to dim(T);
	if T[i] then do;
		fname=substr(vname(T[i]),4);
		leave;
	end;
end;
lastfn=lag(fname);
length per_A per_B per_C per_IC per_ID per_IR per_V per_R per_W 8;
format per_: percent8.2;
array p1 {*} per_A per_B per_C;
array p2 {*} per_IC per_ID per_IR;
array p3 {*} per_V per_R per_W;
if fname^=lastfn then do;
	if _N_>1 then do;
		do i=1 to dim(p1);
			p1[i]=p1[i]/nrecs;
		end;
		do i=1 to dim(p2);
			p2[i]=p2[i]/nrecs;
		end;
		do i=1 to dim(p3);
			p3[i]=p3[i]/nrecs;
		end;
		output;
	end;
	call missing(of p1[*], of p2[*], of p3[*]);
	nrecs=0;
end;
nrecs+1;
do i=1 to dim(p1);
	p1[i]+(activity=scan(vname(p1[i]),2,'_'));
end;
do i=1 to dim(p2);
	p2[i]+(location=scan(vname(p2[i]),2,'_'));
end;
do i=1 to dim(p3);
	p3[i]+(item=scan(vname(p3[i]),2,'_'));
end;
if last then do;
	lastfn=fname;
	do i=1 to dim(p1);
		p1[i]=p1[i]/nrecs;
	end;
	do i=1 to dim(p2);
		p2[i]=p2[i]/nrecs;
	end;
	do i=1 to dim(p3);
		p3[i]=p3[i]/nrecs;
	end;
	output;
end;
keep lastfn per_:;
run;

proc print data=all heading=v width=min; run;
run;

quickbluefish_0-1744124960069.png

 

 

CHL0320
Obsidian | Level 7

Thank you. The values in the final table are the percentage of observation. The value is (the count of the original value in the column/total record number)*100.

quickbluefish
Barite | Level 11
ah, misread your original post - I edited the code above.
PaigeMiller
Diamond | Level 26

NRECS is a variable that is not defined before you use it in this statement

 

p1[i]=p1[i]/nrecs;

 

 

The original question said there are over 200 data sets, how do you up-scale this code to that case? My code handles this case easily, as the comment in my code indicates.

--
Paige Miller
quickbluefish
Barite | Level 11
good point - it should be in the LENGTH statement - the reason it's working is because of the IF _N_>1 THEN... part.

It's the SQL part that's dealing with any input datasets starting with TEMP. I think it should work for any reasonable number (within the max allowed size of a macro var).

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1348 views
  • 6 likes
  • 4 in conversation