BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
A_Swoosh
Quartz | Level 8

Hello,

 

I'm looking for a way to get the percent of total from the total volume with a long dataset.

My dataset example is as follows:

Age Group	Gender	Month	Item	Count	WANT
0-13	FEMALE	Feb_2019_new	TELEVISION	0	#DIV/0!
0-13	FEMALE	Feb_2019_total	TELEVISION	0	
0-13	FEMALE	Feb_2019_new	FOOD	5	38.5%
0-13	FEMALE	Feb_2019_total	FOOD	13	
0-13	MALE	Feb_2019_new	TELEVISION	1	50.0%
0-13	MALE	Feb_2019_total	TELEVISION	2	

I have a long dataset where one row has the new purchases and the second row has total purchases for any given month. I want to get the percent of total for each item by age group, gender, item, and monthly that are new;  the WANT column. Would this be accomplished using multiple proc sql or is there an easier way to do this using a data step?

 

Thanks,

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

If your data reliably has a single month value followed by a total value, then you can use a merge of HAVE with itself offset by one record:

data have;
length name $20;
	infile datalines dsd missover delimiter=",";
	input age_grp $ gender $ name $ item $ count;
datalines;
0-13,FEMALE,FEB_2019_NEW                ,FOOD,12
0-13,FEMALE,FEB_2019_TOTAL              ,FOOD,23
0-13,FEMALE,FEB_2019_NEW                ,GAS,45
0-13,FEMALE,FEB_2019_TOTAL              ,GAS,76
0-13,FEMALE,FEB_2019_NEW                ,TELE,4
0-13,FEMALE,FEB_2019_TOTAL              ,TELE,5
0-13,MALE,FEB_2019_NEW                  ,TELE,3
0-13,MALE,FEB_2019_TOTAL                ,TELE,8
0-13,FEMALE,MAR_2019_NEW                ,TELE,0
0-13,FEMALE,MAR_2019_TOTAL              ,TELE,0
;
run;

data want (drop=_:);
  merge have 
        have (firstobs=2 keep=count rename=(count=_total_count));
  if _total_count^=0 and scan(name,-1,'_')='NEW' then newvar=count/_total_count;
run;

 

 

Now if once in a while there is no total following the new, then the self-merge with offset still works, but you have check that the name variables correspond:

 

data want (drop=_:);
  merge have 
        have (firstobs=2 keep=name count rename=(name=_total_name count=_total_count));
  if scan(name,-1,'_')='NEW' and _total_name=transtrn(name,'_NEW','_TOTAL') and _total_count^=0 
    then newvar=count/_total_count;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

8 REPLIES 8
Reeza
Super User

I would recommend looking into PROC FREQ and the OUTPCT option on the TABLES statement.

 


@A_Swoosh wrote:

Hello,

 

I'm looking for a way to get the percent of total from the total volume with a long dataset.

My dataset example is as follows:

Age Group	Gender	Month	Metric	Item	Total	WANT
0-13	FEMALE	Feb-19	NEW PURCHASES	TELEVISION	0	#DIV/0!
0-13	FEMALE	Feb-19	TOTAL PURCHASES	TELEVISION	0	
0-13	FEMALE	Feb-19	NEW PURCHASES	FOOD	5	38.5%
0-13	FEMALE	Feb-19	TOTAL PURCHASES	FOOD	13	
0-13	MALE	Feb-19	NEW PURCHASES	TELEVISION	1	50.0%
0-13	MALE	Feb-19	TOTAL PURCHASES	TELEVISION	2	

I have a long dataset where one row has the new purchases and the second row has total purchases for any given month. I want to get the percent of total for each item by age group, gender, item, and month that are new;  the WANT column. Would this be accomplished using multiple proc sql or is there an easier way to do this using a data step?

 

Thanks,


 

A_Swoosh
Quartz | Level 8

Sorry, I modified the example case slightly. Wouldn't that not work since I have these under the same variable therefore if I did a frequency table, my output would just sum them together when I'd really only want New row/Total row?

Reeza
Super User
You're correct but your question (and data) are awkwardly stated - your tables isn't aligned either.

If you can guarantee that you will always have New then Total you can look at the DIF and LAG() functions.
MOD() can be used to tell you if a value is odd/even.
_N_ will give you the row number.
or BY group processing.

From what you've shown this would work but this may not scale:

data want;
set have;
by item;
want=lag(total)/total;
if first.item then call missing(want);
run;
A_Swoosh
Quartz | Level 8

My apologies. Let me see if I'm able to state my question a little better.

 

A subset of my long dataset looks like this:

data have;
length name $20;
	infile datalines dsd missover delimiter=",";
	input age_grp $ gender $ name $ item $ count;
datalines;
0-13,FEMALE,FEB_2019_NEW                ,FOOD,12
0-13,FEMALE,FEB_2019_TOTAL              ,FOOD,23
0-13,FEMALE,FEB_2019_NEW                ,GAS,45
0-13,FEMALE,FEB_2019_TOTAL              ,GAS,76
0-13,FEMALE,FEB_2019_NEW                ,TELE,4
0-13,FEMALE,FEB_2019_TOTAL              ,TELE,5
0-13,MALE,FEB_2019_NEW                  ,TELE,3
0-13,MALE,FEB_2019_TOTAL                ,TELE,8
0-13,FEMALE,MAR_2019_NEW                ,TELE,0
0-13,FEMALE,MAR_2019_TOTAL              ,TELE,0
;
run;

I want to turn the dataset into something like this:

data have;
length name $20;
	infile datalines dsd missover delimiter=",";
	input age_grp $ gender $ name $ item $ count want;
datalines;
0-13,FEMALE,FEB_2019_NEW                ,FOOD,12,.5219
0-13,FEMALE,FEB_2019_TOTAL              ,FOOD,23,
0-13,FEMALE,FEB_2019_NEW                ,GAS,45,.592
0-13,FEMALE,FEB_2019_TOTAL              ,GAS,76, 
0-13,FEMALE,FEB_2019_NEW                ,TELE,4,.80
0-13,FEMALE,FEB_2019_TOTAL              ,TELE,5,
0-13,MALE,FEB_2019_NEW                  ,TELE,3,.375
0-13,MALE,FEB_2019_TOTAL                ,TELE,8, 
0-13,FEMALE,MAR_2019_NEW                ,TELE,0,.
0-13,FEMALE,MAR_2019_TOTAL              ,TELE,0, 
;
run;

My BY variables include: Age_grp, Gender, Name, and Item.

 

So, for each month, I want to report the percent of NEW/TOTAL for each age group, gender, and item combination. So, for food purchases among female 0-13 in February 2019, it was 23 total purchases but the percent of those that were NEW were 52.19% (.5219). 

 

I think a lag function might work but wasn't entirely sure how to do the grouping using your example. As you mentioned, I can guarantee that each row will alternate between New then Total.

 

Then to take it one step further, I wanted to concatenate the values for TOTAL and NEW to produce a value for each age group, gender, and item combination to read out the 

TOTAL value (NEW pct). So, in the example above, it would be 23(52.2%).

 

Reeza
Super User
Sort your data so that total is before new and then use the method I included initially, reversing the calculation obviously. SAS doesn't do LEAD well, but LAG is relatively easy. Then resort for you display.
A_Swoosh
Quartz | Level 8

So, I did what you suggested: 

 

data want;
        set monthly_results;
        by  item age_new gender;
        want=total/lag(total);
        if first.item then call missing(want);
run;

I get the value for new/total in the NEW row; however, I get a value in the TOTAL row instead of a MISSING value.

 

Additionally, if I then wanted to concatenate the TOTAL (want) together, how would I go about this?

 

Would I just be better suited to do this using proc sql or transpose the data?

mkeintz
PROC Star

If your data reliably has a single month value followed by a total value, then you can use a merge of HAVE with itself offset by one record:

data have;
length name $20;
	infile datalines dsd missover delimiter=",";
	input age_grp $ gender $ name $ item $ count;
datalines;
0-13,FEMALE,FEB_2019_NEW                ,FOOD,12
0-13,FEMALE,FEB_2019_TOTAL              ,FOOD,23
0-13,FEMALE,FEB_2019_NEW                ,GAS,45
0-13,FEMALE,FEB_2019_TOTAL              ,GAS,76
0-13,FEMALE,FEB_2019_NEW                ,TELE,4
0-13,FEMALE,FEB_2019_TOTAL              ,TELE,5
0-13,MALE,FEB_2019_NEW                  ,TELE,3
0-13,MALE,FEB_2019_TOTAL                ,TELE,8
0-13,FEMALE,MAR_2019_NEW                ,TELE,0
0-13,FEMALE,MAR_2019_TOTAL              ,TELE,0
;
run;

data want (drop=_:);
  merge have 
        have (firstobs=2 keep=count rename=(count=_total_count));
  if _total_count^=0 and scan(name,-1,'_')='NEW' then newvar=count/_total_count;
run;

 

 

Now if once in a while there is no total following the new, then the self-merge with offset still works, but you have check that the name variables correspond:

 

data want (drop=_:);
  merge have 
        have (firstobs=2 keep=name count rename=(name=_total_name count=_total_count));
  if scan(name,-1,'_')='NEW' and _total_name=transtrn(name,'_NEW','_TOTAL') and _total_count^=0 
    then newvar=count/_total_count;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
A_Swoosh
Quartz | Level 8

This appears to work exactly how I described. 

 

Thank you and @Reeza 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2148 views
  • 2 likes
  • 3 in conversation