BookmarkSubscribeRSS Feed
Onizuka
Pyrite | Level 9

Hello everyone !

 

(sorry for my english)

The tittle of the topic is not so explicit because i don't know how to tell that..

 

I'm putting here an example  and then I explain you what i'm expecting for :

 

 

Data have ;
format id1 id2 $5. periode 2. cpt2 8. sum_cum 8.;
input id1 id2 periode cpt2 sum_cum ;
cards;
10000 00001 0 20 20
10000 00001 1 13 33
10000 00001 2 17 50
10000 00001 3 50 100
10000 00001 4 10 110
10000 00001 5 15 125
10000 00001 6 81 81
20000 00002 0 5 5
20000 00002 1 3 8
20000 00002 2 10 18
20000 00002 3 1 19
20000 00002 4 11 30
20000 00002 5 30 60
;
run ;

 

 

This is the table :

 

Capture.PNG

 

- id1 is a head office

 

- id2 is an agency

 

- periode : 0 is the last month of the first trimester (01/03/2019 --> 31/03/2019)

  periode : 1 is the month before (01/02/2019 --> 30/02/2019)

  periode : 3 is the month before (01/01/2019 --> 31/01/2019)

...

 

- cpt2 is the number of survey respondents per periode

 

- sum_cum is the number cumulated of cpt2

 

I'm trying to select data depending on 2 conditions :

 

I want to know which period i have to take acording to 3 criteria :

 

- I need minimum 3 periods (so minimum periode 0, 1 and 2)

- I want minimum 20 respondents

- The maximum of periode is 36,

 

If i don't have 20 respondents on the three periods, then i want to look for the next period and see if I have 20 respondents, if it is less than 20 again, see the next period, ....

if i don't have 20 respondents in 36 periodes then delete, good bye, we don't care haha

 

So this is what i would like to have on the output :

 

Data want ;
format id1 id2 $5. periode 2. cpt2 8. sum_cum 8.;
input id1 id2 periode cpt2 sum_cum ;
cards;
10000 00001 0 20 20 
10000 00001 1 13 33
10000 00001 2 17 50 
20000 00002 0 5 5
20000 00002 1 3 8
20000 00002 2 10 18
20000 00002 3 1 19
20000 00002 4 11 30
;
run ;

 

I don't know how write it in SAS, I don't think that it is very difficul but if you can help me, i will be very grateful !!

 

Thank you !

 

Onizuka

5 REPLIES 5
andreas_lds
Jade | Level 19

First idea: one step to find the last "periode" required to get at least 20 respondents for each id1/id2 combination, then a second step to merge the result of the first step with "have" to create "want", but then i thought why two steps, when all can be done in one - still two passes required, but i don't think this can be avoided, at least not while having in mind that there could be less than 20 respondents.

 


data want;
   keeped = 0;
   lastPeriode = 36;
   summed = 0;

   do until(last.id2);
      set work.have;
      by id1 id2;
      
      if keeped < 3 or summed < 20 then do;
         keeped = keeped + 1;
         lastPeriode = periode;
         summed = sum_cum;
      end;
   end;

   if summed >= 20 then do;
      do until(last.id2);
         set work.have;
         by id1 id2;

         if periode <= lastPeriode then output; 
      end;
   end;

   drop keeped lastPeriode summed;
run;
Onizuka
Pyrite | Level 9

Hello Andreas,

 

Thank you for your response as fast, I have an additionnal constraint which I forgot to put on my topic because I just thought about it and which can complicate a little bit the problem.

 

I want the same thing but i want to keep these periods only if i have respondents in period 0 or period 1 or period 2 (during the first quarter : 01/01/2019 --> 31/03/2019).

 

Indeed,  on my real datas, I have some ids which the first period is period 9 for example (9 month before so on jun 2018).

 

I don't know if my reply is clear .. =/

 

Thank you again for your time , really appreciate it ! 🙂

 

 

andreas_lds
Jade | Level 19

@Onizuka wrote:

Hello Andreas,

 

Thank you for your response as fast, I have an additionnal constraint which I forgot to put on my topic because I just thought about it and which can complicate a little bit the problem.

 

I want the same thing but i want to keep these periods only if i have respondents in period 0 or period 1 or period 2 (during the first quarter : 01/01/2019 --> 31/03/2019).

 

Indeed,  on my real datas, I have some ids which the first period is period 9 for example (9 month before so on jun 2018).

 

I don't know if my reply is clear .. =/

 

Thank you again for your time , really appreciate it ! 🙂

 

 


This adds an interesting aspect to the problem 😉

The code i suggest has at least one major bug: i won't work as expected in some cases i made up. So remenbering KISS (=keep it simple stupid), i returned to the initially idea of two steps and a merge. I modified "have" to contain one case that should no be in the output, because there is no data for periode <= 3.

 

Data have;
   format id1 id2 $5. periode 2. cpt2 8. sum_cum 8.;
   input id1 id2 periode cpt2 sum_cum;
   cards;
10000 00001 0 20 20
10000 00001 1 13 33
10000 00001 2 17 50
10000 00001 3 50 100
10000 00001 4 10 110
10000 00001 5 15 125
10000 00001 6 81 81
12000 00002 5 5 5
12000 00002 6 3 8
12000 00002 7 10 18
12000 00002 8 1 19
20000 00003 0 5 5
20000 00003 1 3 8
20000 00003 2 10 18
20000 00003 3 1 19
20000 00003 4 11 30
20000 00003 5 30 60
;
run;

data work.step1;
   keeped = 0;
   lastPeriode = 36;
   summed = 0;

   do until(last.id2);
      set work.have;
      by id1 id2;
      
      if first.id2 and periode > 3 then do;
         summed = .;
      end;      

      if not missing(summed) and (keeped < 3 or summed < 20) then do;
         keeped = keeped + 1;
         lastPeriode = periode;
         summed = sum_cum;
      end;

   end;

   if summed >= 20;

   keep id1 id2 lastPeriode;
run;

data work.want;
   merge work.have work.step1(in= needed);
   by id1 id2;

   if needed and Periode <= lastPeriode;

   drop lastPeriode;
run;
Onizuka
Pyrite | Level 9

Hello @andreas_lds 

 

Thank you for you answer 🙂

 

Your solution is working very well, I have to say that i'm not understanding everything on your code ..

 

More precisely, I do not understand this part :

 

      if not missing(summed) and (keeped < 3 or summed < 20) then do;
         keeped = keeped + 1;
         lastPeriode = periode;
         summed = sum_cum;
      end;

 

I don't understand the process for keeped and summed, because to me, keeped is always ❤️ and summed <20 because it equals 0 ? (on the beggining of the code)

 

Another question, why if i'm putting the

do until(last.cniv1_cniv4);

after the set, it don't work ?

 

Thank you very much and sorry for bad skills on SAS ... I though I was better in SAS that I really am haha

 

Onizuka

tomrvincent
Rhodochrosite | Level 12
Here's how I'd get the 3 min periodes:
proc sql; select min(periode) into :min1 from want;
select min(periode) into :min2 from want where periode > &min1;
select min(periode) into :min3 from want where periode > &min2;
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1561 views
  • 0 likes
  • 3 in conversation