BookmarkSubscribeRSS Feed
ematznick
Calcite | Level 5

I have a data set where an observation can be a part of multiple groups within a variable. The variable has four options coded as 'red', 'green', 'yellow', 'white'. Observations with multiple are coded as group, group, ...etc. for example 'red, green'. When I run proc freq, proc freq sees this as 5 groups. I want to be able to have the multiple observation count into each respective group so for the example 'red, green' it adds a count to red and one to green instead of being run as it's own group.

%macro disc_tab (dset, var,grp1,grp2,ct,ct1,ct2,ct3,ct4,ctc,name,ord);
proc sort data= &dset out=sort1&dset;
	by &grp1;
proc freq data= sort1&dset;
	table &var/out =outtot&var sparse;
	where &var ne '';
proc freq data=sort1&dset;
	table &var/out=out1&var sparse;
	where &var ne '';
	by &grp1;
proc sort data= &dset out=sort2&dset;
	by &grp2;
proc freq data=sort2&dset;
	table &var/out=out2&var sparse;
	where &var ne '';
	by &grp2;
data outtot&var;
set outtot&var;
	npc=trim(left(count))||"/&ct ("||trim(left(put(percent,5.1)))||')';
data out1&var;
set out1&var;
if subcoh =1 then npc=trim(left(count))||"/&ct1 ("||trim(left(put(percent,5.1)))||')';
if subcoh =2 then npc=trim(left(count))||"/&ct2 ("||trim(left(put(percent,5.1)))||')';
if subcoh =3 then npc=trim(left(count))||"/&ct3 ("||trim(left(put(percent,5.1)))||')';
if subcoh =4 then npc=trim(left(count(*)))||"/&ct4 ("||trim(left(put(percent,5.1)))||')';
data out2&var;
set out2&var;
if comb =0 then npc=trim(left(count))||"/&ct1 ("||trim(left(put(percent,5.1)))||')';
if comb =1 then npc=trim(left(count))||"/&ctc ("||trim(left(put(percent,5.1)))||')';


data out&var;
	set outtot&var out1&var out2&var;
proc SORT; 
	BY  &var;
PROC TRANSPOSE data=out&var out=trans&var;
   VAR npc; 
	BY  &var;
run;

Data table&var;
	length stat $300.;
	FORMAT stat var $300.;
	set trans&var;
	stat=&var;
	drop COL6 ;
	var=&name;
	ord=⩝

run;
%mend disc_tab;

%disc_tab(mh_analysis,mhterm2_display,subcoh,comb,&n_term2,&n1_term2,&n2_term2,&n3_term2,&n4_term2,&nc_term2,'color* [N (%)]',3);
11 REPLIES 11
PaigeMiller
Diamond | Level 26

This method you have described, if I am understanding properly, would cause the sum of the numbers in each group to greater than the number of original observations. Is that what you want?

--
Paige Miller
mkeintz
PROC Star

I don't think I understand your objective.  Can you show what the data looks like that you are submitting to the macro?

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ballardw
Super User

Plus show what the expected result for example data looks like.

ematznick
Calcite | Level 5
Expected for mhterm2_display
elbow 63/102 (61.8%)
knee 51/102 (50.0%)
ankle 3/102 (2.9%)
wrist 3/102 (2.9%)
foot 4/102 (3.9%)
neck 15/102 (14.7%)
PaigeMiller
Diamond | Level 26

@ematznick wrote:
Expected for mhterm2_display
elbow 63/102 (61.8%)
knee 51/102 (50.0%)
ankle 3/102 (2.9%)
wrist 3/102 (2.9%)
foot 4/102 (3.9%)
neck 15/102 (14.7%)

This adds to more than 100%, and that was specifically my first question. Your original description talked only about counts, not percents, and if you want percents, what is the denominator 102? Your answer to my first question was that the sum would be greater than the number of records, but if you run it through PROC FREQ using my code, you get a percent that totals 100%. So PROC FREQ doesn't work? Or does it? 

 

@ematznick , you shouldn't be providing us the problem to be solved in bits and pieces, and only when we ask. Please take a few minutes and write a clear and complete description of the problem, so we don't have to keep asking questions and modifying code, so hopefully, we can solve it once.

--
Paige Miller
ematznick
Calcite | Level 5
The data running through looks like:
Subject mhterm2_display mhterm3_display mhterm4_display
cb01 'elbow' 'catheter' 'green, red'
cb02 'knee, elbow' 'pacemaker' 'blue'
...
cb354
PaigeMiller
Diamond | Level 26

You would probably have to create a new data set named INTERMEDIATE for each variable individually, and then run PROC FREQ for each variable individually. That part would require taking my code and turning it into a macro where the only thing changing is the name of the input variable, which I called STRING.

--
Paige Miller
PaigeMiller
Diamond | Level 26

Hmm, this is a great example of where macros are not needed. You can manipulate the text strings in a data set so that PROC FREQ will produce the results you want.

 

data have;
infile cards missover;
input string $64.;
cards;
red
green
blue
yellow
red,green
;
data intermediate;
    set have;
    length word $ 16;
    do i=1 to countw(string,',');
        word=scan(string,i,',');
        output;
    end;
    keep word;
run;
proc freq data=intermediate;
    tables word;
run;

 

--
Paige Miller
ematznick
Calcite | Level 5
Some of the variables have 16 combinations of 7 groups with a total of 354 observations. Would I do this for each variable that has multiples and would I list all 16 combinations?
PaigeMiller
Diamond | Level 26

It's time you gave specific examples.

--
Paige Miller

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 1975 views
  • 2 likes
  • 4 in conversation