Hello,
I'm looking for a way to get the percent of total from the total volume with a long dataset.
My dataset example is as follows:
Age Group Gender Month Item Count WANT
0-13 FEMALE Feb_2019_new TELEVISION 0 #DIV/0!
0-13 FEMALE Feb_2019_total TELEVISION 0
0-13 FEMALE Feb_2019_new FOOD 5 38.5%
0-13 FEMALE Feb_2019_total FOOD 13
0-13 MALE Feb_2019_new TELEVISION 1 50.0%
0-13 MALE Feb_2019_total TELEVISION 2
I have a long dataset where one row has the new purchases and the second row has total purchases for any given month. I want to get the percent of total for each item by age group, gender, item, and monthly that are new; the WANT column. Would this be accomplished using multiple proc sql or is there an easier way to do this using a data step?
Thanks,
If your data reliably has a single month value followed by a total value, then you can use a merge of HAVE with itself offset by one record:
data have;
length name $20;
infile datalines dsd missover delimiter=",";
input age_grp $ gender $ name $ item $ count;
datalines;
0-13,FEMALE,FEB_2019_NEW ,FOOD,12
0-13,FEMALE,FEB_2019_TOTAL ,FOOD,23
0-13,FEMALE,FEB_2019_NEW ,GAS,45
0-13,FEMALE,FEB_2019_TOTAL ,GAS,76
0-13,FEMALE,FEB_2019_NEW ,TELE,4
0-13,FEMALE,FEB_2019_TOTAL ,TELE,5
0-13,MALE,FEB_2019_NEW ,TELE,3
0-13,MALE,FEB_2019_TOTAL ,TELE,8
0-13,FEMALE,MAR_2019_NEW ,TELE,0
0-13,FEMALE,MAR_2019_TOTAL ,TELE,0
;
run;
data want (drop=_:);
merge have
have (firstobs=2 keep=count rename=(count=_total_count));
if _total_count^=0 and scan(name,-1,'_')='NEW' then newvar=count/_total_count;
run;
Now if once in a while there is no total following the new, then the self-merge with offset still works, but you have check that the name variables correspond:
data want (drop=_:);
merge have
have (firstobs=2 keep=name count rename=(name=_total_name count=_total_count));
if scan(name,-1,'_')='NEW' and _total_name=transtrn(name,'_NEW','_TOTAL') and _total_count^=0
then newvar=count/_total_count;
run;
I would recommend looking into PROC FREQ and the OUTPCT option on the TABLES statement.
@A_Swoosh wrote:
Hello,
I'm looking for a way to get the percent of total from the total volume with a long dataset.
My dataset example is as follows:
Age Group Gender Month Metric Item Total WANT 0-13 FEMALE Feb-19 NEW PURCHASES TELEVISION 0 #DIV/0! 0-13 FEMALE Feb-19 TOTAL PURCHASES TELEVISION 0 0-13 FEMALE Feb-19 NEW PURCHASES FOOD 5 38.5% 0-13 FEMALE Feb-19 TOTAL PURCHASES FOOD 13 0-13 MALE Feb-19 NEW PURCHASES TELEVISION 1 50.0% 0-13 MALE Feb-19 TOTAL PURCHASES TELEVISION 2
I have a long dataset where one row has the new purchases and the second row has total purchases for any given month. I want to get the percent of total for each item by age group, gender, item, and month that are new; the WANT column. Would this be accomplished using multiple proc sql or is there an easier way to do this using a data step?
Thanks,
Sorry, I modified the example case slightly. Wouldn't that not work since I have these under the same variable therefore if I did a frequency table, my output would just sum them together when I'd really only want New row/Total row?
My apologies. Let me see if I'm able to state my question a little better.
A subset of my long dataset looks like this:
data have;
length name $20;
infile datalines dsd missover delimiter=",";
input age_grp $ gender $ name $ item $ count;
datalines;
0-13,FEMALE,FEB_2019_NEW ,FOOD,12
0-13,FEMALE,FEB_2019_TOTAL ,FOOD,23
0-13,FEMALE,FEB_2019_NEW ,GAS,45
0-13,FEMALE,FEB_2019_TOTAL ,GAS,76
0-13,FEMALE,FEB_2019_NEW ,TELE,4
0-13,FEMALE,FEB_2019_TOTAL ,TELE,5
0-13,MALE,FEB_2019_NEW ,TELE,3
0-13,MALE,FEB_2019_TOTAL ,TELE,8
0-13,FEMALE,MAR_2019_NEW ,TELE,0
0-13,FEMALE,MAR_2019_TOTAL ,TELE,0
;
run;
I want to turn the dataset into something like this:
data have;
length name $20;
infile datalines dsd missover delimiter=",";
input age_grp $ gender $ name $ item $ count want;
datalines;
0-13,FEMALE,FEB_2019_NEW ,FOOD,12,.5219
0-13,FEMALE,FEB_2019_TOTAL ,FOOD,23,
0-13,FEMALE,FEB_2019_NEW ,GAS,45,.592
0-13,FEMALE,FEB_2019_TOTAL ,GAS,76,
0-13,FEMALE,FEB_2019_NEW ,TELE,4,.80
0-13,FEMALE,FEB_2019_TOTAL ,TELE,5,
0-13,MALE,FEB_2019_NEW ,TELE,3,.375
0-13,MALE,FEB_2019_TOTAL ,TELE,8,
0-13,FEMALE,MAR_2019_NEW ,TELE,0,.
0-13,FEMALE,MAR_2019_TOTAL ,TELE,0,
;
run;
My BY variables include: Age_grp, Gender, Name, and Item.
So, for each month, I want to report the percent of NEW/TOTAL for each age group, gender, and item combination. So, for food purchases among female 0-13 in February 2019, it was 23 total purchases but the percent of those that were NEW were 52.19% (.5219).
I think a lag function might work but wasn't entirely sure how to do the grouping using your example. As you mentioned, I can guarantee that each row will alternate between New then Total.
Then to take it one step further, I wanted to concatenate the values for TOTAL and NEW to produce a value for each age group, gender, and item combination to read out the
TOTAL value (NEW pct). So, in the example above, it would be 23(52.2%).
So, I did what you suggested:
data want;
set monthly_results;
by item age_new gender;
want=total/lag(total);
if first.item then call missing(want);
run;
I get the value for new/total in the NEW row; however, I get a value in the TOTAL row instead of a MISSING value.
Additionally, if I then wanted to concatenate the TOTAL (want) together, how would I go about this?
Would I just be better suited to do this using proc sql or transpose the data?
If your data reliably has a single month value followed by a total value, then you can use a merge of HAVE with itself offset by one record:
data have;
length name $20;
infile datalines dsd missover delimiter=",";
input age_grp $ gender $ name $ item $ count;
datalines;
0-13,FEMALE,FEB_2019_NEW ,FOOD,12
0-13,FEMALE,FEB_2019_TOTAL ,FOOD,23
0-13,FEMALE,FEB_2019_NEW ,GAS,45
0-13,FEMALE,FEB_2019_TOTAL ,GAS,76
0-13,FEMALE,FEB_2019_NEW ,TELE,4
0-13,FEMALE,FEB_2019_TOTAL ,TELE,5
0-13,MALE,FEB_2019_NEW ,TELE,3
0-13,MALE,FEB_2019_TOTAL ,TELE,8
0-13,FEMALE,MAR_2019_NEW ,TELE,0
0-13,FEMALE,MAR_2019_TOTAL ,TELE,0
;
run;
data want (drop=_:);
merge have
have (firstobs=2 keep=count rename=(count=_total_count));
if _total_count^=0 and scan(name,-1,'_')='NEW' then newvar=count/_total_count;
run;
Now if once in a while there is no total following the new, then the self-merge with offset still works, but you have check that the name variables correspond:
data want (drop=_:);
merge have
have (firstobs=2 keep=name count rename=(name=_total_name count=_total_count));
if scan(name,-1,'_')='NEW' and _total_name=transtrn(name,'_NEW','_TOTAL') and _total_count^=0
then newvar=count/_total_count;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.