Hi,
I have a dataset that contains a variable for year (2010-2013), a variable for unitID (LT0100 to LT2400) and some other variables (a_1-a_8;b_1-b_8;etc...).
In a data step below, I want to create a new variable (f_5) based on (one variable divided by the sum of that variable), where the summary procedure is done given they all belong to the same year and same unit Id.
I think that I need to create some do-loops or macro variables, but I can´t figure out the specific coding. This is what I have tried:
proc sql;
select distinct unitID into :Unit separated by ', ' from Dataset1;
select distinct year into :year separated by ', ' from Dataset1;
quit;
data Dataset1;
set Dataset1;
if &unitID and &year then
a2_tot = sum(a_2);
f_5 = a_2/a2_tot;
drop a2_tot;
run;
Any suggestions on how to create the desired new variable?
Best regards,
Hank
Like Tom said, sql will be a lot easier in this case and, in fact, can do everything in just a couple of lines. Here is the sql code and yet another way to do it in a datastep:
proc sql;
create table want as
select *,a_2/sum(a_2) as f_5
from have
group by unitid,year
order by year,unitid,code
;
quit;
/*or*/
proc sort data=have;
by unitid year code;
run;
data want (drop=sum_f_5);
do until (last.year);
set have (where=(code ne 0));
by unitid year;
sum_f_5=ifn(first.year,a_2,sum(sum_f_5,a_2));
end;
do until (last.year);
set have;
by unitid year;
f_5=a_2/sum_f_5;
output;
end;
run;
proc sort data=want;
by year unitid code;
run;
Example have and want datasets are usually an easier way of showing the forum what you really want to achieve. From your description it appears like you only want to accomplish something like the following:
proc sql;
create table want as
select unitid,year,x,x/sum(x) as f_5
from have
group by unitid,year
;
quit;
Thanks for your reply! Here is a clarification:
In the dataset below, I have all the variables except f_5. I want to create f_5 as "a_2/SUM(a_2, where UnitID and year is the same and Code range between 1-6)".
In other words, I want to have the ratio of a specific code value, over the sum of all code values (except 0), for the variable a_2 for each specific UnitID and Year.
Preferably in a data step, since the creation of this new variable (f_5) is just one out of many more variables that needs to be created in ways similar to this one and I would prefer if all could be created in the same data step.
UnitId | Year | Code | a_1 | a_2 | a_3 | f_5 |
LT0100 | 2010 | 0 | ||||
LT0100 | 2010 | 1 | 2959139 | 320212 | 1600244 | 0,137518 |
LT0100 | 2010 | 2 | 4565010 | 142715 | 996557 | 0,06129 |
LT0100 | 2010 | 3 | 3301774 | 401540 | 2041513 | 0,172445 |
LT0100 | 2010 | 4 | 7384528 | 1163174 | 5855854 | 0,499536 |
LT0100 | 2010 | 5 | 1357620 | 33371 | 1125468 | 0,014332 |
LT0100 | 2010 | 6 | 1454282 | 267495 | 1110066 | 0,114878 |
LT0300 | 2010 | 0 | ||||
LT0300 | 2010 | 1 | 649706 | 171904 | 326910 | 0,15413 |
LT0300 | 2010 | 2 | 1058760 | 97291 | 167824 | 0,087232 |
LT0300 | 2010 | 3 | 1156493 | 348738 | 733397 | 0,31268 |
LT0300 | 2010 | 4 | 2047786 | 415344 | 1444201 | 0,372399 |
LT0300 | 2010 | 5 | 257839 | 76602 | 174777 | 0,068682 |
LT0300 | 2010 | 6 | 248166 | 5440 | 241026 | 0,004878 |
LT0400 | 2010 | 0 | ||||
LT0400 | 2010 | 1 | 478329 | 37400 | 224111 | 0,091693 |
LT0400 | 2010 | 2 | 745501 | 16167 | 65280 | 0,039637 |
LT0400 | 2010 | 3 | 705415 | 133909 | 542181 | 0,328304 |
LT0400 | 2010 | 4 | 901986 | 182750 | 712861 | 0,448047 |
LT0400 | 2010 | 5 | 175015 | 37655 | 112965 | 0,092319 |
LT0400 | 2010 | 6 | 185470 | 185470 | ||
LT0100 | 2011 | 0 | ||||
LT0100 | 2011 | 1 | 2959139 | 352233,2 | 1600244 | 0,137518 |
LT0100 | 2011 | 2 | 4565010 | 156986,5 | 996557 | 0,06129 |
LT0100 | 2011 | 3 | 3301774 | 441694 | 2041513 | 0,172445 |
LT0100 | 2011 | 4 | 7384528 | 1279491 | 5855854 | 0,499536 |
LT0100 | 2011 | 5 | 1357620 | 36708,1 | 1125468 | 0,014332 |
LT0100 | 2011 | 6 | 1454282 | 294244,5 | 1110066 | 0,114878 |
LT0300 | 2011 | 0 | ||||
LT0300 | 2011 | 1 | 649706 | 189094,4 | 326910 | 0,15413 |
LT0300 | 2011 | 2 | 1058760 | 107020,1 | 167824 | 0,087232 |
LT0300 | 2011 | 3 | 1156493 | 383611,8 | 733397 | 0,31268 |
LT0300 | 2011 | 4 | 2047786 | 456878,4 | 1444201 | 0,372399 |
LT0300 | 2011 | 5 | 257839 | 84262,2 | 174777 | 0,068682 |
LT0300 | 2011 | 6 | 248166 | 5984 | 241026 | 0,004878 |
LT0400 | 2011 | 0 | ||||
LT0400 | 2011 | 1 | 478329 | 41140 | 224111 | 0,091693 |
LT0400 | 2011 | 2 | 745501 | 17783,7 | 65280 | 0,039637 |
LT0400 | 2011 | 3 | 705415 | 147299,9 | 542181 | 0,328304 |
LT0400 | 2011 | 4 | 901986 | 201025 | 712861 | 0,448047 |
LT0400 | 2011 | 5 | 175015 | 41420,5 | 112965 | 0,092319 |
LT0400 | 2011 | 6 | 185470 | 185470 |
Best regards,
Hank
Hi, Hank
Thanks for the improved explanation.
While it is doable to accomplish everything you need in a single data step, it's fairly challenging. I would recommend Arthur's approach to get the summarized data first, in a step like this:
proc sql;
create table have_summ as
select UnitId, Year, sum(a_2) as Sum_a_2
from have
where Code > 0
group by UnitId, Year
order by UnitId, Year;
quit;
and then sort your original dataset:
proc sort data=have;
by UnitId Year;
run;
and then in your processing data step, you have access to the sum of a_2 as Sum_a_2, and can easily do your division, along with all of your other processing.
data want;
merge have have_summ;
by UnitId Year;
...
division, and other processing
...
run;
Like Tom said, sql will be a lot easier in this case and, in fact, can do everything in just a couple of lines. Here is the sql code and yet another way to do it in a datastep:
proc sql;
create table want as
select *,a_2/sum(a_2) as f_5
from have
group by unitid,year
order by year,unitid,code
;
quit;
/*or*/
proc sort data=have;
by unitid year code;
run;
data want (drop=sum_f_5);
do until (last.year);
set have (where=(code ne 0));
by unitid year;
sum_f_5=ifn(first.year,a_2,sum(sum_f_5,a_2));
end;
do until (last.year);
set have;
by unitid year;
f_5=a_2/sum_f_5;
output;
end;
run;
proc sort data=want;
by year unitid code;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.