BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
YellowBanana
Calcite | Level 5

Hello,

I have a dataset that looks like this:

APP_IDQUARTER
APP12019Q1
APP22019Q1
APP32019Q1
APP42019Q1
APP52019Q1
APP62019Q1
APP72019Q1
APP82019Q1
APP92019Q1
APP102019Q1
APP112019Q2
APP122019Q2
APP132019Q2
APP142019Q2
APP152019Q2
APP162019Q2
APP172019Q2
APP182019Q2
APP192019Q2
APP202019Q2
APP212019Q2
APP222019Q2
APP232019Q2
APP242019Q2
APP252019Q2
APP262019Q2
APP272019Q2
APP282019Q3
APP292019Q3
APP302019Q3
APP312019Q4
APP322019Q4
APP33 
APP34 
APP35 
APP36 
APP37 
APP38 
APP39 
APP40 
APP41 
APP42 
APP43 
APP44 
APP45 
APP46 
APP47 
APP48 

 

And the distribution of frequencies:

QUARTERCOUNTFREQ
2019Q11020,8%
2019Q21735,4%
2019Q336,3%
2019Q424,2%
 1633,3%

 

I need to fill missing values with other quarters in such way that distribution of frequencies would be almost uniform (all values occur with almost the same frequency as much as possible).

The number of application can change. As time goes in year, as the number can increase. Now there may appear applications with quarters 2019Q2 or 2019Q3 or missing values, but for example in december they can be application with 2019Q3 or 2019Q4 or missing values.

Missing values can only be filled with quarters from the next year.

The non-missing values cannot be changed.

 

At this moment it could look like this:

APP_IDQUARTER
APP12019Q1
APP22019Q1
APP32019Q1
APP42019Q1
APP52019Q1
APP62019Q1
APP72019Q1
APP82019Q1
APP92019Q1
APP102019Q1
APP112019Q2
APP122019Q2
APP132019Q2
APP142019Q2
APP152019Q2
APP162019Q2
APP172019Q2
APP182019Q2
APP192019Q2
APP202019Q2
APP212019Q2
APP222019Q2
APP232019Q2
APP242019Q2
APP252019Q2
APP262019Q2
APP272019Q2
APP282019Q3
APP292019Q3
APP302019Q3
APP312019Q4
APP322019Q4
APP332019Q3
APP342019Q3
APP352019Q3
APP362019Q3
APP372019Q3
APP382019Q3
APP392019Q3
APP402019Q3
APP412019Q4
APP422019Q4
APP432019Q4
APP442019Q4
APP452019Q4
APP462019Q4
APP472019Q4
APP482019Q4

 

And then distribution of frequencies may be:

QUARTERCOUNTFREQ
2019Q11020,8%
2019Q21735,4%
2019Q31122,9%
2019Q41020,8%
 00,0%

 

I will be thankful for any suggestions.

P.S. Sorry for my english.

1 ACCEPTED SOLUTION

Accepted Solutions
JohnHoughton
Quartz | Level 8

Hi. There was a flaw in the code of my previous post, but here is a fix. Clumsy in places but it works.

data have;
	input app_id  $8. quarter $6.;
	datalines;
APP1    2019Q1
APP2    2019Q1
APP3    2019Q1
APP4    2019Q1
APP5    2019Q1
APP6    2019Q1
APP7    2019Q1
APP8    2019Q1
APP9    2019Q1
APP10   2019Q1
APP11   2019Q2
APP12   2019Q2
APP13   2019Q2
APP14   2019Q2
APP15   2019Q2
APP16   2019Q2
APP17   2019Q2
APP18   2019Q2
APP19   2019Q2
APP20   2019Q2
APP21   2019Q2
APP22   2019Q2
APP23   2019Q2
APP24   2019Q2
APP25   2019Q2
APP26   2019Q2
APP27   2019Q2
APP28   2019Q3
APP29   2019Q3
APP30   2019Q3
APP31   2019Q4
APP32   2019Q4
APP33   
APP34   
APP35    
APP36    
APP37    
APP38    
APP39    
APP40    
APP41    
APP42    
APP43    
APP44    
APP45    
APP46    
APP47    
APP48    
;
run;

proc sql noprint;
	select count(distinct quarter), count(*) /count(distinct quarter) into :Nqtrs,:target from have;
quit;

proc sql;
	create table temp_a as 
	select quarter, count(*) as N
	from have 
	group by quarter
	having count(*)<=&target or quarter='';
	
	create table temp_b as 
	select quarter, 
		(select sum(N) from temp_a)/(select count (distinct quarter) from temp_a where quarter^='')-N as need , 
		(select N from temp_A where quarter='') as available 
		from temp_a 
		where quarter ^='';
quit;


data temp_c;set temp_b;
retain cumneed;
if _N_=1 then do; 
	cumneed=need; 
	need2=floor(need);
	end;
else do;
	cumneed=need+cumneed;
end;
cumneed2=floor(cumneed);
 need2=cumneed2-lag(cumneed2);
 if need2=. then need2=floor(need);
run;



data temp_d (keep=quarter);
	set temp_c end=eof;
	retain left ;
	if _N_=1 then left=available;
	do i=1 to need2;
		left=left-1;
		output;
	end;
run;

data temp_e;
	set have (where=(quarter=''));
	set temp_d;
run;

data want;
	set have (where=(quarter^='')) temp_e;
run;

*As a check , compare count from the have and want datasets;
proc sql;
	select a.*, b.want from (select h.quarter, count(*) as have from have h group 
		by h.quarter) a left join 
(select w.quarter, count(*) as want from want w group by w.quarter) b on 
		a.quarter=b.quarter;
quit; 

 

 

View solution in original post

4 REPLIES 4
Ksharp
Super User

You should post it at Operation Research Forum.and calling @RobPratt

JohnHoughton
Quartz | Level 8

Hi. I've tested this with a few scenarios and it seems to give what you want.

 

I've edited to remove this code. Updated version is in another post

 

 

(Edit - this had errors so I removed the code)

JohnHoughton
Quartz | Level 8

Hi. There was a flaw in the code of my previous post, but here is a fix. Clumsy in places but it works.

data have;
	input app_id  $8. quarter $6.;
	datalines;
APP1    2019Q1
APP2    2019Q1
APP3    2019Q1
APP4    2019Q1
APP5    2019Q1
APP6    2019Q1
APP7    2019Q1
APP8    2019Q1
APP9    2019Q1
APP10   2019Q1
APP11   2019Q2
APP12   2019Q2
APP13   2019Q2
APP14   2019Q2
APP15   2019Q2
APP16   2019Q2
APP17   2019Q2
APP18   2019Q2
APP19   2019Q2
APP20   2019Q2
APP21   2019Q2
APP22   2019Q2
APP23   2019Q2
APP24   2019Q2
APP25   2019Q2
APP26   2019Q2
APP27   2019Q2
APP28   2019Q3
APP29   2019Q3
APP30   2019Q3
APP31   2019Q4
APP32   2019Q4
APP33   
APP34   
APP35    
APP36    
APP37    
APP38    
APP39    
APP40    
APP41    
APP42    
APP43    
APP44    
APP45    
APP46    
APP47    
APP48    
;
run;

proc sql noprint;
	select count(distinct quarter), count(*) /count(distinct quarter) into :Nqtrs,:target from have;
quit;

proc sql;
	create table temp_a as 
	select quarter, count(*) as N
	from have 
	group by quarter
	having count(*)<=&target or quarter='';
	
	create table temp_b as 
	select quarter, 
		(select sum(N) from temp_a)/(select count (distinct quarter) from temp_a where quarter^='')-N as need , 
		(select N from temp_A where quarter='') as available 
		from temp_a 
		where quarter ^='';
quit;


data temp_c;set temp_b;
retain cumneed;
if _N_=1 then do; 
	cumneed=need; 
	need2=floor(need);
	end;
else do;
	cumneed=need+cumneed;
end;
cumneed2=floor(cumneed);
 need2=cumneed2-lag(cumneed2);
 if need2=. then need2=floor(need);
run;



data temp_d (keep=quarter);
	set temp_c end=eof;
	retain left ;
	if _N_=1 then left=available;
	do i=1 to need2;
		left=left-1;
		output;
	end;
run;

data temp_e;
	set have (where=(quarter=''));
	set temp_d;
run;

data want;
	set have (where=(quarter^='')) temp_e;
run;

*As a check , compare count from the have and want datasets;
proc sql;
	select a.*, b.want from (select h.quarter, count(*) as have from have h group 
		by h.quarter) a left join 
(select w.quarter, count(*) as want from want w group by w.quarter) b on 
		a.quarter=b.quarter;
quit; 

 

 

YellowBanana
Calcite | Level 5
It works. Thanks a lot for help! 🙂

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 697 views
  • 1 like
  • 3 in conversation