BookmarkSubscribeRSS Feed
lixuan
Obsidian | Level 7

Hi , I have a new question here. I have a database like this. For example,  I want  the variable yymm which is date format to be 4 quarters (one year) before  the variable yymm_e, but you can see some data missed, so I wanna  complete my lost data. 

data have;
informat company $15. roe 4.3 yymm yymm_e yymmn6.;
format yymm yymm_e yymmn6.;
input company$  roe yymm  yymm_e ;
cards;
a 0.05 198503 198603
a 0.03 198512 198603
a 0.3  198606 198703
b 0.4  198812 198902
;
run;

The table structure I want should be as following . Thanks&Regards.

擷取.PNG

17 REPLIES 17
Reeza
Super User

What is the logic here?

 


@lixuan wrote:

Hi , I have a new question here. I have a database like this. For example,  I want  the variable yymm which is date format to be 3 months before  the variable yymm_e, but you can see some data missed, so I wanna  complete my lost data. 

data have;
informat company $15. roe 4.3 yymm yymm_e yymmn6.;
format yymm yymm_e yymmn6.;
input company$  roe yymm  yymm_e ;
cards;
a 0.05 198601 198603
a 0.03 198512 198603
a 0.3  198611 198703
b 0.4  198812 198902
;
run;

The table structure I want should be as following . Thanks&Regards.

擷取.PNG


 

PeterClemmensen
Tourmaline | Level 20

What do you mean by complete lost data? 

 

For Winsorization, check out this article

 

http://blogs.sas.com/content/iml/2017/02/08/winsorization-good-bad-and-ugly.html

lixuan
Obsidian | Level 7

 Maybe my expression is not clear. I have revised my question. The data I got wasn't complete, some data were missing, so i want to make it up. Before I do this job, I need to know which data is missing. I am sorry because english is not my mother language. If any questions or suggestions Pls let me know. Thanks a lot.

novinosrin
Tourmaline | Level 20

Hi, I concur with Reeza's suggestion as SAS ETS procs are pretty much built for such applications. However, if you want a datastep solution, I can give it a shot for you. Before that, can you please clarify the logic from 5th row in your wanted output:

 

I apologise if this is lame, but if you don't mind I would appreciate that so much.

 

 

 

 

Reeza
Super User

PROC TIMESERIES can be used to fill in missing time periods, as long as you can clearly define them.

I'm not quite sure what your starting/ending interval are but it should work for you.

 

See a fully worked example here:

https://gist.github.com/statgeek/07a3708dee1225ceb9d4aa75daab2c52

Shmuel
Garnet | Level 18

Create a table of all combinations having the company code, 4 previous year quarters and current year-month.

You need a list of the companies, can be created by sql selectint distinct company then do:

proc sql;
    create table companies as
    select distint company
    from have;
quit;

data help;
   set companies;
         length year $4  yymm $6;
         retain year='1988';  /* previous year */
         array qtr $ '03' '06' '09' '12';
         do i=1 to 4;
              yymm = year || qtr(i);
          output;
         keep company yymm;
run;

data want;
      merge help have;
     by company yymm;
run;

taht code assumed yymm is a charcter variable.

in case it is a date variable with a desired format, adapt the code to your needs.

lixuan
Obsidian | Level 7

Yes, I do want to  create a table with full comany name and quarter date, then  I can merge with original table to get the result. But you can see the dates  are different. As for company a, the first  two obervations of  yymm_e are both 198603, so there are two yymm missed which are 198506 and 198509. The third observation a whose yymm_e  is 198703, and only one observation, so there are three yymm missed which are 198603,198609 and 198612. That's what I want.

In summary, my final goal is to complete  the missing data of roe instead of deleting them. Thanks a lot.

Reeza
Super User

@lixuan wrote:

Yes, I do want to  create a table with full comany name and quarter date, then  I can merge with original table to get the result. But you can see the dates  are different. As for company a, the first  two obervations of  yymm_e are both 198603, so there are two yymm missed which are 198506 and 198509. The third observation a whose yymm_e  is 198703, and only one observation, so there are three yymm missed which are 198603,198609 and 198612. That's what I want.

In summary, my final goal is to complete  the missing data of roe instead of deleting them. Thanks a lot.


How is the end date identified or known?

lixuan
Obsidian | Level 7

The end date is four quarters before yymm_e.

 

Reeza
Super User

I tried and then realized I still don't understand your logic...so here's a start. I found the smallest and largest end and then used INTCK to determine the number of periods needed. Then looped through to create a quarter record for each of the starting points using INTNX. 

 

Your last record, b, starts from a year before but that doesn't apply to the first record so I don't know why it's inconsistent. Anyways, build the skeleton table with all your periods and then merge it into your original dataset was the approach I was trying to take. 

 

Documentation for INTCK/INTNX

http://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#p0g056g35ez8son1sfav...

 

data have;
	informat company $15. roe 4.3 yymm yymm_e yymmn6.;
	format yymm yymm_e yymmn6.;
	input company$  roe yymm yymm_e;
	cards;
a 0.05 198503 198603
a 0.03 198512 198603
a 0.3  198606 198703
b 0.4  198812 198902
;
run;

proc sql ;
	create table skeleton as select company, min(yymm) as start, max(yymm_e) as 
		end from have group by company;
quit;

data skeleton_all;
	set skeleton;
	num_periods=intck('qtr', start, end);

	do i=0 to num_periods;
		date=intnx('qtr', start, i, 'e');
		format date yymmn6.;
		output;
	end;
	format start end yymmn6.;
run;

 

lixuan
Obsidian | Level 7

Thanks, sorry, I didnt expalin my logic very clearly. The blank table I want like this . yymm is in the four quarters before the yymm_e, so the loop end should be 4. I tried but failed. 

擷取.PNG

data want;
set have;
do i=1 to 4;
date=intnx('qtr',yymm_e,-i,'e');
format date yymmn6.;
end;
run;

 

Reeza
Super User

The logic is still unclear. I think this is one of those times you really need to sit down and draw out the full logic. Or hopefully someone else will understand. 

lixuan
Obsidian | Level 7

OK, thank you very much.

lixuan
Obsidian | Level 7

Maybe it will use array to finish it , and I am not familar with the usage of array.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 17 replies
  • 1790 views
  • 5 likes
  • 5 in conversation