I'm trying to write the equivalent SAS code for:
proc sql;
create table dataset2 as select var1, var2, var3, var4, var5, count(*)
from dataset1
group by var1, var2, var3, var4, var5;
proc sort data=dataset1; by var1 var2 var3 var4 var5; run;
data dataset2;
set dataset1;
by var1 var2 var3 var4 var5;
if first.var5 then count=0;
count+1;
if last.var5 then output;
run;
data dataset1;
input var1 var2 var3 var4 var5;
datalines;
1 3 65 63 77
1 2 65 63 66
;
proc sort data=dataset1;
by var1 var2 var3 var4 var5;
run;
data dataset2(keep= var1 var2 var3 var4 var5 count);
set dataset1;
by var1 var2 var3 var4 var5;
count+1;
run;
sassharp,
Your code will count across the entire file rather than across by group as the SQL will. You are basically creating a sequence number for dataset1.
Agreed Fried Egg.
As said friedEgg
data dataset1;
input var1 var2 var3 var4 var5;
datalines;
1 3 65 63 77
1 2 65 63 66
2 2 2 2 2
2 2 2 2 2
;
proc sort data=dataset1;
by var1 var2 var3 var4 var5;
run;
data dataset2(keep= var1 var2 var3 var4 var5 count);
set dataset1;
by var1 var2 var3 var4 var5;
if first.var5 then count=0;
count+1;
if last.var5 then output;
run;
DOW seems pretty handy for this case:
data dataset1;
input var1 var2 var3 var4 var5;
datalines;
1 3 65 63 77
1 2 65 63 66
2 2 2 2 2
2 2 2 2 2
;
proc sort data=dataset1;
by var1 var2 var3 var4 var5;
run;
data want;
do sum=1 by 1 until (last.var5);
set dataset1;
by var1-var5;
end;
run;
proc print;run;
Regards,
Haikuo
data dataset1;
input var1 var2 var3 var4 var5;
datalines;
1 3 65 63 77
1 2 65 63 66
2 2 2 2 2
2 2 2 2 2
;
proc sort data=dataset1;
by var1 var2 var3 var4 var5;
run;
data dataset2(keep= var1 var2 var3 var4 var5 count);
set dataset1;
by var1 var2 var3 var4 var5;
if first.var5 then count=0;
count+1;
if last.var5 then output;
run;
/*or*/
proc sql;
create table dataset2 as
select var1,var2,var3,var4,var5,count(*)
from dataset1
group by var1,var2,var3,var4,var5;
run;
Another approach:
proc freq data=dataset1;
tables var1 * var2 * var3 * var4 * var5 / noprint out=dataset2 (drop=percent);
run;
If there are a lot of combinations of the 5 variables, it is conceivable this would run out of memory. But the tradeoff is that it works on unsorted data.
Another alternative would be proc means, which is a multi-threading procedure which will make the operation faster for larger data:
proc means data=dataset1 n nway noprint;
class var1-var5;
output out=dataset2(drop=_type_ rename=(_freq_=count));
run;
Another approach, using Hash this time, I know it's silly, just try to show what Hash can do:
data have;
input v1-v6;
datalines;
2 2 2 2 2 4
2 2 2 2 2 77
1 3 65 63 77 2
1 2 65 63 66 3
2 2 2 2 2 76
;
data want (drop=_rc);
length sum 8;
if _n_ = 1 then do;
declare hash h(suminc: "sum", ordered: "a");
declare hiter hi("h");
h.defineKey('v1', 'v2','v3', 'v4', 'v5');
h.definedata('v1', 'v2','v3', 'v4', 'v5');
h.defineDone();
sum = 1;
end;
do while (not done);
set have end=done;
_rc = h.ref();
end;
_rc = hi.first();
do while(_rc = 0);
_rc = h.sum(sum: sum);
output;
_rc = hi.next();
end;
stop;
run;
proc print;run;
Regards,
Haikuo
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.