Hello,
I'm looking for a way to get the percent of total from the total volume with a long dataset.
My dataset example is as follows:
Age Group Gender Month Item Count WANT
0-13 FEMALE Feb_2019_new TELEVISION 0 #DIV/0!
0-13 FEMALE Feb_2019_total TELEVISION 0
0-13 FEMALE Feb_2019_new FOOD 5 38.5%
0-13 FEMALE Feb_2019_total FOOD 13
0-13 MALE Feb_2019_new TELEVISION 1 50.0%
0-13 MALE Feb_2019_total TELEVISION 2
I have a long dataset where one row has the new purchases and the second row has total purchases for any given month. I want to get the percent of total for each item by age group, gender, item, and monthly that are new; the WANT column. Would this be accomplished using multiple proc sql or is there an easier way to do this using a data step?
Thanks,
If your data reliably has a single month value followed by a total value, then you can use a merge of HAVE with itself offset by one record:
data have;
length name $20;
infile datalines dsd missover delimiter=",";
input age_grp $ gender $ name $ item $ count;
datalines;
0-13,FEMALE,FEB_2019_NEW ,FOOD,12
0-13,FEMALE,FEB_2019_TOTAL ,FOOD,23
0-13,FEMALE,FEB_2019_NEW ,GAS,45
0-13,FEMALE,FEB_2019_TOTAL ,GAS,76
0-13,FEMALE,FEB_2019_NEW ,TELE,4
0-13,FEMALE,FEB_2019_TOTAL ,TELE,5
0-13,MALE,FEB_2019_NEW ,TELE,3
0-13,MALE,FEB_2019_TOTAL ,TELE,8
0-13,FEMALE,MAR_2019_NEW ,TELE,0
0-13,FEMALE,MAR_2019_TOTAL ,TELE,0
;
run;
data want (drop=_:);
merge have
have (firstobs=2 keep=count rename=(count=_total_count));
if _total_count^=0 and scan(name,-1,'_')='NEW' then newvar=count/_total_count;
run;
Now if once in a while there is no total following the new, then the self-merge with offset still works, but you have check that the name variables correspond:
data want (drop=_:);
merge have
have (firstobs=2 keep=name count rename=(name=_total_name count=_total_count));
if scan(name,-1,'_')='NEW' and _total_name=transtrn(name,'_NEW','_TOTAL') and _total_count^=0
then newvar=count/_total_count;
run;
I would recommend looking into PROC FREQ and the OUTPCT option on the TABLES statement.
@A_Swoosh wrote:
Hello,
I'm looking for a way to get the percent of total from the total volume with a long dataset.
My dataset example is as follows:
Age Group Gender Month Metric Item Total WANT 0-13 FEMALE Feb-19 NEW PURCHASES TELEVISION 0 #DIV/0! 0-13 FEMALE Feb-19 TOTAL PURCHASES TELEVISION 0 0-13 FEMALE Feb-19 NEW PURCHASES FOOD 5 38.5% 0-13 FEMALE Feb-19 TOTAL PURCHASES FOOD 13 0-13 MALE Feb-19 NEW PURCHASES TELEVISION 1 50.0% 0-13 MALE Feb-19 TOTAL PURCHASES TELEVISION 2
I have a long dataset where one row has the new purchases and the second row has total purchases for any given month. I want to get the percent of total for each item by age group, gender, item, and month that are new; the WANT column. Would this be accomplished using multiple proc sql or is there an easier way to do this using a data step?
Thanks,
Sorry, I modified the example case slightly. Wouldn't that not work since I have these under the same variable therefore if I did a frequency table, my output would just sum them together when I'd really only want New row/Total row?
My apologies. Let me see if I'm able to state my question a little better.
A subset of my long dataset looks like this:
data have;
length name $20;
infile datalines dsd missover delimiter=",";
input age_grp $ gender $ name $ item $ count;
datalines;
0-13,FEMALE,FEB_2019_NEW ,FOOD,12
0-13,FEMALE,FEB_2019_TOTAL ,FOOD,23
0-13,FEMALE,FEB_2019_NEW ,GAS,45
0-13,FEMALE,FEB_2019_TOTAL ,GAS,76
0-13,FEMALE,FEB_2019_NEW ,TELE,4
0-13,FEMALE,FEB_2019_TOTAL ,TELE,5
0-13,MALE,FEB_2019_NEW ,TELE,3
0-13,MALE,FEB_2019_TOTAL ,TELE,8
0-13,FEMALE,MAR_2019_NEW ,TELE,0
0-13,FEMALE,MAR_2019_TOTAL ,TELE,0
;
run;
I want to turn the dataset into something like this:
data have;
length name $20;
infile datalines dsd missover delimiter=",";
input age_grp $ gender $ name $ item $ count want;
datalines;
0-13,FEMALE,FEB_2019_NEW ,FOOD,12,.5219
0-13,FEMALE,FEB_2019_TOTAL ,FOOD,23,
0-13,FEMALE,FEB_2019_NEW ,GAS,45,.592
0-13,FEMALE,FEB_2019_TOTAL ,GAS,76,
0-13,FEMALE,FEB_2019_NEW ,TELE,4,.80
0-13,FEMALE,FEB_2019_TOTAL ,TELE,5,
0-13,MALE,FEB_2019_NEW ,TELE,3,.375
0-13,MALE,FEB_2019_TOTAL ,TELE,8,
0-13,FEMALE,MAR_2019_NEW ,TELE,0,.
0-13,FEMALE,MAR_2019_TOTAL ,TELE,0,
;
run;
My BY variables include: Age_grp, Gender, Name, and Item.
So, for each month, I want to report the percent of NEW/TOTAL for each age group, gender, and item combination. So, for food purchases among female 0-13 in February 2019, it was 23 total purchases but the percent of those that were NEW were 52.19% (.5219).
I think a lag function might work but wasn't entirely sure how to do the grouping using your example. As you mentioned, I can guarantee that each row will alternate between New then Total.
Then to take it one step further, I wanted to concatenate the values for TOTAL and NEW to produce a value for each age group, gender, and item combination to read out the
TOTAL value (NEW pct). So, in the example above, it would be 23(52.2%).
So, I did what you suggested:
data want;
set monthly_results;
by item age_new gender;
want=total/lag(total);
if first.item then call missing(want);
run;
I get the value for new/total in the NEW row; however, I get a value in the TOTAL row instead of a MISSING value.
Additionally, if I then wanted to concatenate the TOTAL (want) together, how would I go about this?
Would I just be better suited to do this using proc sql or transpose the data?
If your data reliably has a single month value followed by a total value, then you can use a merge of HAVE with itself offset by one record:
data have;
length name $20;
infile datalines dsd missover delimiter=",";
input age_grp $ gender $ name $ item $ count;
datalines;
0-13,FEMALE,FEB_2019_NEW ,FOOD,12
0-13,FEMALE,FEB_2019_TOTAL ,FOOD,23
0-13,FEMALE,FEB_2019_NEW ,GAS,45
0-13,FEMALE,FEB_2019_TOTAL ,GAS,76
0-13,FEMALE,FEB_2019_NEW ,TELE,4
0-13,FEMALE,FEB_2019_TOTAL ,TELE,5
0-13,MALE,FEB_2019_NEW ,TELE,3
0-13,MALE,FEB_2019_TOTAL ,TELE,8
0-13,FEMALE,MAR_2019_NEW ,TELE,0
0-13,FEMALE,MAR_2019_TOTAL ,TELE,0
;
run;
data want (drop=_:);
merge have
have (firstobs=2 keep=count rename=(count=_total_count));
if _total_count^=0 and scan(name,-1,'_')='NEW' then newvar=count/_total_count;
run;
Now if once in a while there is no total following the new, then the self-merge with offset still works, but you have check that the name variables correspond:
data want (drop=_:);
merge have
have (firstobs=2 keep=name count rename=(name=_total_name count=_total_count));
if scan(name,-1,'_')='NEW' and _total_name=transtrn(name,'_NEW','_TOTAL') and _total_count^=0
then newvar=count/_total_count;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.