Solved: Re: Transformation from counts to percentages

buechler66 · Posted 10-19-2016 01:17 PM

Hi. I have a dataset with a single record. It contains a count for each column. I need to output another dataset with one record, like this one, but that has Percentage for each day (and NULL) column instead of a count. Can anyone advise me on creating this?

TOTAL	    _08_01_2016	    _07_30_2016	     _07_29_2016     NULL
1917462	    61	            4813	     18477	     118420

Reeza · Posted 10-19-2016 04:19 PM

null is a bad variable name, as it has other context within databases and programming.

You can use the _: wildcard, but you will need to know the amount of variables to do the percentages.

Storing the dates in your variable names isn't a good idea and violates best practices.

Here's a good article on managing data, from an R perspective but it holds.

https://www.jstatsoft.org/article/view/v059i10

I would *seriously* consider tranpsosing your data to a long format, doing your calculations and then retransposing if required.

Otherwise, determine the number of days needed for percent by querying the sashelp.vcolumn table and storing that number in a macro variable. That can be used to index your arrays.

proc sql;
select count(*) into :count_vars
from sashelp.vcolumn where upper(libname)='WORK' and upper(memname) = 'HAVE' and name like '_%';
quit;

array cts(*) _: null;
array pcts(*) pct1-pct&count_vars pct_null;

The rest of the code is the same.

View solution in original post

Reeza · Posted 10-19-2016 01:37 PM

Wide data is always a bit annoying to deal with

1. Declare two arrays, one for counts and one for percentages

array cts(*) _date_vars list;

array pcts(*) day1-day3;

2. Loop over array to calculate percentages

do I= 1 to dim(cts);

pcts(I)=cts(I) / total ;

end;

buechler66 · Posted 10-19-2016 04:03 PM

Thanks for the suggestion. I should've mentioned the number of days (day1-day3) is not static. It may vary from run to run. Can I handle that in some way? Also, what about handling the NULL variable?

Reeza · Posted 10-19-2016 04:04 PM

Do you have a naming convention for the variables?

buechler66 · Posted 10-19-2016 04:07 PM

All the variables except NULL are named like a date with an underscore in front. Like _07_29_2016.

buechler66 · Posted 10-19-2016 04:12 PM

I create the above dataset using this code:

/* RULE: MATCHING + NO SCORE IMPACTING. Calculate the totals for this rule and then append this record to FinalData&QTR */
proc means data=QueryData&ZIP5 sum;
var total _: null;
output out=cal_sums&ZIP5 (drop=_type_ _freq_) sum=;
run;

As you can see it creates the _date looking variables (_08_01_2016) using the _: wildcard on the VAR statement.

var total _: null;

Is there any way to create a dataset of Percentages in a similar way?

Reeza · Posted 10-19-2016 04:19 PM

null is a bad variable name, as it has other context within databases and programming.

You can use the _: wildcard, but you will need to know the amount of variables to do the percentages.

Storing the dates in your variable names isn't a good idea and violates best practices.

Here's a good article on managing data, from an R perspective but it holds.

https://www.jstatsoft.org/article/view/v059i10

I would *seriously* consider tranpsosing your data to a long format, doing your calculations and then retransposing if required.

Otherwise, determine the number of days needed for percent by querying the sashelp.vcolumn table and storing that number in a macro variable. That can be used to index your arrays.

proc sql;
select count(*) into :count_vars
from sashelp.vcolumn where upper(libname)='WORK' and upper(memname) = 'HAVE' and name like '_%';
quit;

array cts(*) _: null;
array pcts(*) pct1-pct&count_vars pct_null;

The rest of the code is the same.

buechler66 · Posted 10-19-2016 04:36 PM

I will give this a shot. Thank you.

buechler66 · Posted 11-04-2016 05:16 PM

Thanks for your time and assistance!

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away