Hello everyone !
(sorry for my english)
The tittle of the topic is not so explicit because i don't know how to tell that..
I'm putting here an example and then I explain you what i'm expecting for :
Data have ;
format id1 id2 $5. periode 2. cpt2 8. sum_cum 8.;
input id1 id2 periode cpt2 sum_cum ;
cards;
10000 00001 0 20 20
10000 00001 1 13 33
10000 00001 2 17 50
10000 00001 3 50 100
10000 00001 4 10 110
10000 00001 5 15 125
10000 00001 6 81 81
20000 00002 0 5 5
20000 00002 1 3 8
20000 00002 2 10 18
20000 00002 3 1 19
20000 00002 4 11 30
20000 00002 5 30 60
;
run ;
This is the table :
- id1 is a head office
- id2 is an agency
- periode : 0 is the last month of the first trimester (01/03/2019 --> 31/03/2019)
periode : 1 is the month before (01/02/2019 --> 30/02/2019)
periode : 3 is the month before (01/01/2019 --> 31/01/2019)
...
- cpt2 is the number of survey respondents per periode
- sum_cum is the number cumulated of cpt2
I'm trying to select data depending on 2 conditions :
I want to know which period i have to take acording to 3 criteria :
- I need minimum 3 periods (so minimum periode 0, 1 and 2)
- I want minimum 20 respondents
- The maximum of periode is 36,
If i don't have 20 respondents on the three periods, then i want to look for the next period and see if I have 20 respondents, if it is less than 20 again, see the next period, ....
if i don't have 20 respondents in 36 periodes then delete, good bye, we don't care haha
So this is what i would like to have on the output :
Data want ;
format id1 id2 $5. periode 2. cpt2 8. sum_cum 8.;
input id1 id2 periode cpt2 sum_cum ;
cards;
10000 00001 0 20 20
10000 00001 1 13 33
10000 00001 2 17 50
20000 00002 0 5 5
20000 00002 1 3 8
20000 00002 2 10 18
20000 00002 3 1 19
20000 00002 4 11 30
;
run ;
I don't know how write it in SAS, I don't think that it is very difficul but if you can help me, i will be very grateful !!
Thank you !
Onizuka
First idea: one step to find the last "periode" required to get at least 20 respondents for each id1/id2 combination, then a second step to merge the result of the first step with "have" to create "want", but then i thought why two steps, when all can be done in one - still two passes required, but i don't think this can be avoided, at least not while having in mind that there could be less than 20 respondents.
data want;
keeped = 0;
lastPeriode = 36;
summed = 0;
do until(last.id2);
set work.have;
by id1 id2;
if keeped < 3 or summed < 20 then do;
keeped = keeped + 1;
lastPeriode = periode;
summed = sum_cum;
end;
end;
if summed >= 20 then do;
do until(last.id2);
set work.have;
by id1 id2;
if periode <= lastPeriode then output;
end;
end;
drop keeped lastPeriode summed;
run;
Hello Andreas,
Thank you for your response as fast, I have an additionnal constraint which I forgot to put on my topic because I just thought about it and which can complicate a little bit the problem.
I want the same thing but i want to keep these periods only if i have respondents in period 0 or period 1 or period 2 (during the first quarter : 01/01/2019 --> 31/03/2019).
Indeed, on my real datas, I have some ids which the first period is period 9 for example (9 month before so on jun 2018).
I don't know if my reply is clear .. =/
Thank you again for your time , really appreciate it ! 🙂
@Onizuka wrote:
Hello Andreas,
Thank you for your response as fast, I have an additionnal constraint which I forgot to put on my topic because I just thought about it and which can complicate a little bit the problem.
I want the same thing but i want to keep these periods only if i have respondents in period 0 or period 1 or period 2 (during the first quarter : 01/01/2019 --> 31/03/2019).
Indeed, on my real datas, I have some ids which the first period is period 9 for example (9 month before so on jun 2018).
I don't know if my reply is clear .. =/
Thank you again for your time , really appreciate it ! 🙂
This adds an interesting aspect to the problem 😉
The code i suggest has at least one major bug: i won't work as expected in some cases i made up. So remenbering KISS (=keep it simple stupid), i returned to the initially idea of two steps and a merge. I modified "have" to contain one case that should no be in the output, because there is no data for periode <= 3.
Data have;
format id1 id2 $5. periode 2. cpt2 8. sum_cum 8.;
input id1 id2 periode cpt2 sum_cum;
cards;
10000 00001 0 20 20
10000 00001 1 13 33
10000 00001 2 17 50
10000 00001 3 50 100
10000 00001 4 10 110
10000 00001 5 15 125
10000 00001 6 81 81
12000 00002 5 5 5
12000 00002 6 3 8
12000 00002 7 10 18
12000 00002 8 1 19
20000 00003 0 5 5
20000 00003 1 3 8
20000 00003 2 10 18
20000 00003 3 1 19
20000 00003 4 11 30
20000 00003 5 30 60
;
run;
data work.step1;
keeped = 0;
lastPeriode = 36;
summed = 0;
do until(last.id2);
set work.have;
by id1 id2;
if first.id2 and periode > 3 then do;
summed = .;
end;
if not missing(summed) and (keeped < 3 or summed < 20) then do;
keeped = keeped + 1;
lastPeriode = periode;
summed = sum_cum;
end;
end;
if summed >= 20;
keep id1 id2 lastPeriode;
run;
data work.want;
merge work.have work.step1(in= needed);
by id1 id2;
if needed and Periode <= lastPeriode;
drop lastPeriode;
run;
Hello @andreas_lds
Thank you for you answer 🙂
Your solution is working very well, I have to say that i'm not understanding everything on your code ..
More precisely, I do not understand this part :
if not missing(summed) and (keeped < 3 or summed < 20) then do;
keeped = keeped + 1;
lastPeriode = periode;
summed = sum_cum;
end;
I don't understand the process for keeped and summed, because to me, keeped is always ❤️ and summed <20 because it equals 0 ? (on the beggining of the code)
Another question, why if i'm putting the
do until(last.cniv1_cniv4);after the set, it don't work ?
Thank you very much and sorry for bad skills on SAS ... I though I was better in SAS that I really am haha
Onizuka
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.