I'm trying to write the equivalent SAS code for:
proc sql;
create table dataset2 as select var1, var2, var3, var4, var5, count(*)
from dataset1
group by var1, var2, var3, var4, var5;
proc sort data=dataset1; by var1 var2 var3 var4 var5; run;
data dataset2;
set dataset1;
by var1 var2 var3 var4 var5;
if first.var5 then count=0;
count+1;
if last.var5 then output;
run;
data dataset1;
input var1 var2 var3 var4 var5;
datalines;
1 3 65 63 77
1 2 65 63 66
;
proc sort data=dataset1;
by var1 var2 var3 var4 var5;
run;
data dataset2(keep= var1 var2 var3 var4 var5 count);
set dataset1;
by var1 var2 var3 var4 var5;
count+1;
run;
sassharp,
Your code will count across the entire file rather than across by group as the SQL will. You are basically creating a sequence number for dataset1.
Agreed Fried Egg.
As said friedEgg
data dataset1;
input var1 var2 var3 var4 var5;
datalines;
1 3 65 63 77
1 2 65 63 66
2 2 2 2 2
2 2 2 2 2
;
proc sort data=dataset1;
by var1 var2 var3 var4 var5;
run;
data dataset2(keep= var1 var2 var3 var4 var5 count);
set dataset1;
by var1 var2 var3 var4 var5;
if first.var5 then count=0;
count+1;
if last.var5 then output;
run;
DOW seems pretty handy for this case:
data dataset1;
input var1 var2 var3 var4 var5;
datalines;
1 3 65 63 77
1 2 65 63 66
2 2 2 2 2
2 2 2 2 2
;
proc sort data=dataset1;
by var1 var2 var3 var4 var5;
run;
data want;
do sum=1 by 1 until (last.var5);
set dataset1;
by var1-var5;
end;
run;
proc print;run;
Regards,
Haikuo
data dataset1;
input var1 var2 var3 var4 var5;
datalines;
1 3 65 63 77
1 2 65 63 66
2 2 2 2 2
2 2 2 2 2
;
proc sort data=dataset1;
by var1 var2 var3 var4 var5;
run;
data dataset2(keep= var1 var2 var3 var4 var5 count);
set dataset1;
by var1 var2 var3 var4 var5;
if first.var5 then count=0;
count+1;
if last.var5 then output;
run;
/*or*/
proc sql;
create table dataset2 as
select var1,var2,var3,var4,var5,count(*)
from dataset1
group by var1,var2,var3,var4,var5;
run;
Another approach:
proc freq data=dataset1;
tables var1 * var2 * var3 * var4 * var5 / noprint out=dataset2 (drop=percent);
run;
If there are a lot of combinations of the 5 variables, it is conceivable this would run out of memory. But the tradeoff is that it works on unsorted data.
Another alternative would be proc means, which is a multi-threading procedure which will make the operation faster for larger data:
proc means data=dataset1 n nway noprint;
class var1-var5;
output out=dataset2(drop=_type_ rename=(_freq_=count));
run;
Another approach, using Hash this time, I know it's silly, just try to show what Hash can do:
data have;
input v1-v6;
datalines;
2 2 2 2 2 4
2 2 2 2 2 77
1 3 65 63 77 2
1 2 65 63 66 3
2 2 2 2 2 76
;
data want (drop=_rc);
length sum 8;
if _n_ = 1 then do;
declare hash h(suminc: "sum", ordered: "a");
declare hiter hi("h");
h.defineKey('v1', 'v2','v3', 'v4', 'v5');
h.definedata('v1', 'v2','v3', 'v4', 'v5');
h.defineDone();
sum = 1;
end;
do while (not done);
set have end=done;
_rc = h.ref();
end;
_rc = hi.first();
do while(_rc = 0);
_rc = h.sum(sum: sum);
output;
_rc = hi.next();
end;
stop;
run;
proc print;run;
Regards,
Haikuo
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.