Hello,
I have huge dataset, trying to get the repeated ids in prior quarter in Proc SQL. Months(Jan-Aug) - 2 quarters
Given Data:
Id Month
1 jan
1 feb
1 mar
1 apr
1 may
1 jun
1 jul
1 aug
2 jan
2 feb
2 mar
2 apr
2 may
2 aug
3 mar
3 apr
3 jun
3 aug
4 jun
5 jan
5 may
5 jun
5 aug
6 jan
Need Output:
Quarter1 Total no Ids No of ID's repeated in prior quarter
Jan-apr 6 0
may-aug 5 4
Thanks for checking
Below will need some modification in case your Month variable is not character but a SAS date/datetime value or if your source table is in a database.
You write your source table is "huge". What does that mean? Can you quantify? If it's really "huge" and performance is an issue then you'd need to tell us a bit more like: Where is the source table stored, is the data already sorted, type of variables (character or numeric, length of character variables if any, number of rows - may be best just post the result of a Proc Contents).
proc format;
invalue period
'jan', 'feb', 'mar', 'apr' = 1
'may', 'jun' , 'jul', 'aug' = 2
other=.
;
value period
1 = 'jan - apr'
2 = 'may - aug'
other= 'other'
;
run;
data have;
input id month $;
group=input(month, period.);
datalines;
1 jan
1 feb
1 mar
1 apr
1 may
1 jun
1 jul
1 aug
2 jan
2 feb
2 mar
2 apr
2 may
2 aug
3 mar
3 apr
3 jun
3 aug
4 jun
5 jan
5 may
5 jun
5 aug
6 jan
7 feb
;
proc sql;
select
o.group format=period.,
count(distinct o.id) as cnt_id,
count(distinct i.id) as cnt_id_prev_period
from
have o
left join
(
select distinct id, group
from have
/* where group=1*/
) i
on o.id=i.id and o.group=i.group+1
group by o.group
;
quit;
Is it a requirement to use PROC SQL?
Below will need some modification in case your Month variable is not character but a SAS date/datetime value or if your source table is in a database.
You write your source table is "huge". What does that mean? Can you quantify? If it's really "huge" and performance is an issue then you'd need to tell us a bit more like: Where is the source table stored, is the data already sorted, type of variables (character or numeric, length of character variables if any, number of rows - may be best just post the result of a Proc Contents).
proc format;
invalue period
'jan', 'feb', 'mar', 'apr' = 1
'may', 'jun' , 'jul', 'aug' = 2
other=.
;
value period
1 = 'jan - apr'
2 = 'may - aug'
other= 'other'
;
run;
data have;
input id month $;
group=input(month, period.);
datalines;
1 jan
1 feb
1 mar
1 apr
1 may
1 jun
1 jul
1 aug
2 jan
2 feb
2 mar
2 apr
2 may
2 aug
3 mar
3 apr
3 jun
3 aug
4 jun
5 jan
5 may
5 jun
5 aug
6 jan
7 feb
;
proc sql;
select
o.group format=period.,
count(distinct o.id) as cnt_id,
count(distinct i.id) as cnt_id_prev_period
from
have o
left join
(
select distinct id, group
from have
/* where group=1*/
) i
on o.id=i.id and o.group=i.group+1
group by o.group
;
quit;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.