How to winsorize and complete the lost data

Reply
Frequent Contributor
Posts: 97

How to winsorize and complete the lost data

[ Edited ]

Hi , I have a new question here. I have a database like this. For example,  I want  the variable yymm which is date format to be 4 quarters (one year) before  the variable yymm_e, but you can see some data missed, so I wanna  complete my lost data. 

data have;
informat company $15. roe 4.3 yymm yymm_e yymmn6.;
format yymm yymm_e yymmn6.;
input company$  roe yymm  yymm_e ;
cards;
a 0.05 198503 198603
a 0.03 198512 198603
a 0.3  198606 198703
b 0.4  198812 198902
;
run;

The table structure I want should be as following . Thanks&Regards.

擷取.PNG

Super User
Posts: 17,963

Re: How to winsorize and complete the lost data

What is the logic here?

 


lixuan wrote:

Hi , I have a new question here. I have a database like this. For example,  I want  the variable yymm which is date format to be 3 months before  the variable yymm_e, but you can see some data missed, so I wanna  complete my lost data. 

data have;
informat company $15. roe 4.3 yymm yymm_e yymmn6.;
format yymm yymm_e yymmn6.;
input company$  roe yymm  yymm_e ;
cards;
a 0.05 198601 198603
a 0.03 198512 198603
a 0.3  198611 198703
b 0.4  198812 198902
;
run;

The table structure I want should be as following . Thanks&Regards.

擷取.PNG


 

PROC Star
Posts: 554

Re: How to winsorize and complete the lost data

What do you mean by complete lost data? 

 

For Winsorization, check out this article

 

http://blogs.sas.com/content/iml/2017/02/08/winsorization-good-bad-and-ugly.html

Frequent Contributor
Posts: 97

Re: How to winsorize and complete the lost data

 Maybe my expression is not clear. I have revised my question. The data I got wasn't complete, some data were missing, so i want to make it up. Before I do this job, I need to know which data is missing. I am sorry because english is not my mother language. If any questions or suggestions Pls let me know. Thanks a lot.

PROC Star
Posts: 173

Re: How to winsorize and complete the lost data

Hi, I concur with Reeza's suggestion as SAS ETS procs are pretty much built for such applications. However, if you want a datastep solution, I can give it a shot for you. Before that, can you please clarify the logic from 5th row in your wanted output:

 

I apologise if this is lame, but if you don't mind I would appreciate that so much.

 

 

 

 

Super User
Posts: 17,963

Re: How to winsorize and complete the lost data

PROC TIMESERIES can be used to fill in missing time periods, as long as you can clearly define them.

I'm not quite sure what your starting/ending interval are but it should work for you.

 

See a fully worked example here:

https://gist.github.com/statgeek/07a3708dee1225ceb9d4aa75daab2c52

Trusted Advisor
Posts: 1,405

Re: How to winsorize and complete the lost data

Create a table of all combinations having the company code, 4 previous year quarters and current year-month.

You need a list of the companies, can be created by sql selectint distinct company then do:

proc sql;
    create table companies as
    select distint company
    from have;
quit;

data help;
   set companies;
         length year $4  yymm $6;
         retain year='1988';  /* previous year */
         array qtr $ '03' '06' '09' '12';
         do i=1 to 4;
              yymm = year || qtr(i);
          output;
         keep company yymm;
run;

data want;
      merge help have;
     by company yymm;
run;

taht code assumed yymm is a charcter variable.

in case it is a date variable with a desired format, adapt the code to your needs.

Frequent Contributor
Posts: 97

Re: How to winsorize and complete the lost data

[ Edited ]

Yes, I do want to  create a table with full comany name and quarter date, then  I can merge with original table to get the result. But you can see the dates  are different. As for company a, the first  two obervations of  yymm_e are both 198603, so there are two yymm missed which are 198506 and 198509. The third observation a whose yymm_e  is 198703, and only one observation, so there are three yymm missed which are 198603,198609 and 198612. That's what I want.

In summary, my final goal is to complete  the missing data of roe instead of deleting them. Thanks a lot.

Super User
Posts: 17,963

Re: How to winsorize and complete the lost data


lixuan wrote:

Yes, I do want to  create a table with full comany name and quarter date, then  I can merge with original table to get the result. But you can see the dates  are different. As for company a, the first  two obervations of  yymm_e are both 198603, so there are two yymm missed which are 198506 and 198509. The third observation a whose yymm_e  is 198703, and only one observation, so there are three yymm missed which are 198603,198609 and 198612. That's what I want.

In summary, my final goal is to complete  the missing data of roe instead of deleting them. Thanks a lot.


How is the end date identified or known?

Frequent Contributor
Posts: 97

Re: How to winsorize and complete the lost data

The end date is four quarters before yymm_e.

 

Super User
Posts: 17,963

Re: How to winsorize and complete the lost data

I tried and then realized I still don't understand your logic...so here's a start. I found the smallest and largest end and then used INTCK to determine the number of periods needed. Then looped through to create a quarter record for each of the starting points using INTNX. 

 

Your last record, b, starts from a year before but that doesn't apply to the first record so I don't know why it's inconsistent. Anyways, build the skeleton table with all your periods and then merge it into your original dataset was the approach I was trying to take. 

 

Documentation for INTCK/INTNX

http://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#p0g056g35ez8son1sfav...

 

data have;
	informat company $15. roe 4.3 yymm yymm_e yymmn6.;
	format yymm yymm_e yymmn6.;
	input company$  roe yymm yymm_e;
	cards;
a 0.05 198503 198603
a 0.03 198512 198603
a 0.3  198606 198703
b 0.4  198812 198902
;
run;

proc sql ;
	create table skeleton as select company, min(yymm) as start, max(yymm_e) as 
		end from have group by company;
quit;

data skeleton_all;
	set skeleton;
	num_periods=intck('qtr', start, end);

	do i=0 to num_periods;
		date=intnx('qtr', start, i, 'e');
		format date yymmn6.;
		output;
	end;
	format start end yymmn6.;
run;

 

Frequent Contributor
Posts: 97

Re: How to winsorize and complete the lost data

Thanks, sorry, I didnt expalin my logic very clearly. The blank table I want like this . yymm is in the four quarters before the yymm_e, so the loop end should be 4. I tried but failed. 

擷取.PNG

data want;
set have;
do i=1 to 4;
date=intnx('qtr',yymm_e,-i,'e');
format date yymmn6.;
end;
run;

 

Super User
Posts: 17,963

Re: How to winsorize and complete the lost data

The logic is still unclear. I think this is one of those times you really need to sit down and draw out the full logic. Or hopefully someone else will understand. 

Frequent Contributor
Posts: 97

Re: How to winsorize and complete the lost data

OK, thank you very much.

Frequent Contributor
Posts: 97

Re: How to winsorize and complete the lost data

Maybe it will use array to finish it , and I am not familar with the usage of array.

Ask a Question
Discussion stats
  • 17 replies
  • 294 views
  • 5 likes
  • 5 in conversation