BookmarkSubscribeRSS Feed
Onizuka
Pyrite | Level 9

Hello everyone !

 

(sorry for my english)

The tittle of the topic is not so explicit because i don't know how to tell that..

 

I'm putting here an example  and then I explain you what i'm expecting for :

 

 

Data have ;
format id1 id2 $5. periode 2. cpt2 8. sum_cum 8.;
input id1 id2 periode cpt2 sum_cum ;
cards;
10000 00001 0 20 20
10000 00001 1 13 33
10000 00001 2 17 50
10000 00001 3 50 100
10000 00001 4 10 110
10000 00001 5 15 125
10000 00001 6 81 81
20000 00002 0 5 5
20000 00002 1 3 8
20000 00002 2 10 18
20000 00002 3 1 19
20000 00002 4 11 30
20000 00002 5 30 60
;
run ;

 

 

This is the table :

 

Capture.PNG

 

- id1 is a head office

 

- id2 is an agency

 

- periode : 0 is the last month of the first trimester (01/03/2019 --> 31/03/2019)

  periode : 1 is the month before (01/02/2019 --> 30/02/2019)

  periode : 3 is the month before (01/01/2019 --> 31/01/2019)

...

 

- cpt2 is the number of survey respondents per periode

 

- sum_cum is the number cumulated of cpt2

 

I'm trying to select data depending on 2 conditions :

 

I want to know which period i have to take acording to 3 criteria :

 

- I need minimum 3 periods (so minimum periode 0, 1 and 2)

- I want minimum 20 respondents

- The maximum of periode is 36,

 

If i don't have 20 respondents on the three periods, then i want to look for the next period and see if I have 20 respondents, if it is less than 20 again, see the next period, ....

if i don't have 20 respondents in 36 periodes then delete, good bye, we don't care haha

 

So this is what i would like to have on the output :

 

Data want ;
format id1 id2 $5. periode 2. cpt2 8. sum_cum 8.;
input id1 id2 periode cpt2 sum_cum ;
cards;
10000 00001 0 20 20 
10000 00001 1 13 33
10000 00001 2 17 50 
20000 00002 0 5 5
20000 00002 1 3 8
20000 00002 2 10 18
20000 00002 3 1 19
20000 00002 4 11 30
;
run ;

 

I don't know how write it in SAS, I don't think that it is very difficul but if you can help me, i will be very grateful !!

 

Thank you !

 

Onizuka

5 REPLIES 5
andreas_lds
Jade | Level 19

First idea: one step to find the last "periode" required to get at least 20 respondents for each id1/id2 combination, then a second step to merge the result of the first step with "have" to create "want", but then i thought why two steps, when all can be done in one - still two passes required, but i don't think this can be avoided, at least not while having in mind that there could be less than 20 respondents.

 


data want;
   keeped = 0;
   lastPeriode = 36;
   summed = 0;

   do until(last.id2);
      set work.have;
      by id1 id2;
      
      if keeped < 3 or summed < 20 then do;
         keeped = keeped + 1;
         lastPeriode = periode;
         summed = sum_cum;
      end;
   end;

   if summed >= 20 then do;
      do until(last.id2);
         set work.have;
         by id1 id2;

         if periode <= lastPeriode then output; 
      end;
   end;

   drop keeped lastPeriode summed;
run;
Onizuka
Pyrite | Level 9

Hello Andreas,

 

Thank you for your response as fast, I have an additionnal constraint which I forgot to put on my topic because I just thought about it and which can complicate a little bit the problem.

 

I want the same thing but i want to keep these periods only if i have respondents in period 0 or period 1 or period 2 (during the first quarter : 01/01/2019 --> 31/03/2019).

 

Indeed,  on my real datas, I have some ids which the first period is period 9 for example (9 month before so on jun 2018).

 

I don't know if my reply is clear .. =/

 

Thank you again for your time , really appreciate it ! 🙂

 

 

andreas_lds
Jade | Level 19

@Onizuka wrote:

Hello Andreas,

 

Thank you for your response as fast, I have an additionnal constraint which I forgot to put on my topic because I just thought about it and which can complicate a little bit the problem.

 

I want the same thing but i want to keep these periods only if i have respondents in period 0 or period 1 or period 2 (during the first quarter : 01/01/2019 --> 31/03/2019).

 

Indeed,  on my real datas, I have some ids which the first period is period 9 for example (9 month before so on jun 2018).

 

I don't know if my reply is clear .. =/

 

Thank you again for your time , really appreciate it ! 🙂

 

 


This adds an interesting aspect to the problem 😉

The code i suggest has at least one major bug: i won't work as expected in some cases i made up. So remenbering KISS (=keep it simple stupid), i returned to the initially idea of two steps and a merge. I modified "have" to contain one case that should no be in the output, because there is no data for periode <= 3.

 

Data have;
   format id1 id2 $5. periode 2. cpt2 8. sum_cum 8.;
   input id1 id2 periode cpt2 sum_cum;
   cards;
10000 00001 0 20 20
10000 00001 1 13 33
10000 00001 2 17 50
10000 00001 3 50 100
10000 00001 4 10 110
10000 00001 5 15 125
10000 00001 6 81 81
12000 00002 5 5 5
12000 00002 6 3 8
12000 00002 7 10 18
12000 00002 8 1 19
20000 00003 0 5 5
20000 00003 1 3 8
20000 00003 2 10 18
20000 00003 3 1 19
20000 00003 4 11 30
20000 00003 5 30 60
;
run;

data work.step1;
   keeped = 0;
   lastPeriode = 36;
   summed = 0;

   do until(last.id2);
      set work.have;
      by id1 id2;
      
      if first.id2 and periode > 3 then do;
         summed = .;
      end;      

      if not missing(summed) and (keeped < 3 or summed < 20) then do;
         keeped = keeped + 1;
         lastPeriode = periode;
         summed = sum_cum;
      end;

   end;

   if summed >= 20;

   keep id1 id2 lastPeriode;
run;

data work.want;
   merge work.have work.step1(in= needed);
   by id1 id2;

   if needed and Periode <= lastPeriode;

   drop lastPeriode;
run;
Onizuka
Pyrite | Level 9

Hello @andreas_lds 

 

Thank you for you answer 🙂

 

Your solution is working very well, I have to say that i'm not understanding everything on your code ..

 

More precisely, I do not understand this part :

 

      if not missing(summed) and (keeped < 3 or summed < 20) then do;
         keeped = keeped + 1;
         lastPeriode = periode;
         summed = sum_cum;
      end;

 

I don't understand the process for keeped and summed, because to me, keeped is always ❤️ and summed <20 because it equals 0 ? (on the beggining of the code)

 

Another question, why if i'm putting the

do until(last.cniv1_cniv4);

after the set, it don't work ?

 

Thank you very much and sorry for bad skills on SAS ... I though I was better in SAS that I really am haha

 

Onizuka

tomrvincent
Rhodochrosite | Level 12
Here's how I'd get the 3 min periodes:
proc sql; select min(periode) into :min1 from want;
select min(periode) into :min2 from want where periode > &min1;
select min(periode) into :min3 from want where periode > &min2;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 559 views
  • 0 likes
  • 3 in conversation