Hello,
I have huge dataset, trying to get the repeated ids in prior quarter in Proc SQL. Months(Jan-Aug) - 2 quarters
Given Data:
Id Month
1 jan
1 feb
1 mar
1 apr
1 may
1 jun
1 jul
1 aug
2 jan
2 feb
2 mar
2 apr
2 may
2 aug
3 mar
3 apr
3 jun
3 aug
4 jun
5 jan
5 may
5 jun
5 aug
6 jan
Need Output:
Quarter1 Total no Ids No of ID's repeated in prior quarter
Jan-apr 6 0
may-aug 5 4
Thanks for checking
Below will need some modification in case your Month variable is not character but a SAS date/datetime value or if your source table is in a database.
You write your source table is "huge". What does that mean? Can you quantify? If it's really "huge" and performance is an issue then you'd need to tell us a bit more like: Where is the source table stored, is the data already sorted, type of variables (character or numeric, length of character variables if any, number of rows - may be best just post the result of a Proc Contents).
proc format;
invalue period
'jan', 'feb', 'mar', 'apr' = 1
'may', 'jun' , 'jul', 'aug' = 2
other=.
;
value period
1 = 'jan - apr'
2 = 'may - aug'
other= 'other'
;
run;
data have;
input id month $;
group=input(month, period.);
datalines;
1 jan
1 feb
1 mar
1 apr
1 may
1 jun
1 jul
1 aug
2 jan
2 feb
2 mar
2 apr
2 may
2 aug
3 mar
3 apr
3 jun
3 aug
4 jun
5 jan
5 may
5 jun
5 aug
6 jan
7 feb
;
proc sql;
select
o.group format=period.,
count(distinct o.id) as cnt_id,
count(distinct i.id) as cnt_id_prev_period
from
have o
left join
(
select distinct id, group
from have
/* where group=1*/
) i
on o.id=i.id and o.group=i.group+1
group by o.group
;
quit;
Is it a requirement to use PROC SQL?
Below will need some modification in case your Month variable is not character but a SAS date/datetime value or if your source table is in a database.
You write your source table is "huge". What does that mean? Can you quantify? If it's really "huge" and performance is an issue then you'd need to tell us a bit more like: Where is the source table stored, is the data already sorted, type of variables (character or numeric, length of character variables if any, number of rows - may be best just post the result of a Proc Contents).
proc format;
invalue period
'jan', 'feb', 'mar', 'apr' = 1
'may', 'jun' , 'jul', 'aug' = 2
other=.
;
value period
1 = 'jan - apr'
2 = 'may - aug'
other= 'other'
;
run;
data have;
input id month $;
group=input(month, period.);
datalines;
1 jan
1 feb
1 mar
1 apr
1 may
1 jun
1 jul
1 aug
2 jan
2 feb
2 mar
2 apr
2 may
2 aug
3 mar
3 apr
3 jun
3 aug
4 jun
5 jan
5 may
5 jun
5 aug
6 jan
7 feb
;
proc sql;
select
o.group format=period.,
count(distinct o.id) as cnt_id,
count(distinct i.id) as cnt_id_prev_period
from
have o
left join
(
select distinct id, group
from have
/* where group=1*/
) i
on o.id=i.id and o.group=i.group+1
group by o.group
;
quit;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.