Hi , I have a new question here. I have a database like this. For example, I want the variable yymm which is date format to be 4 quarters (one year) before the variable yymm_e, but you can see some data missed, so I wanna complete my lost data.
data have;
informat company $15. roe 4.3 yymm yymm_e yymmn6.;
format yymm yymm_e yymmn6.;
input company$ roe yymm yymm_e ;
cards;
a 0.05 198503 198603
a 0.03 198512 198603
a 0.3 198606 198703
b 0.4 198812 198902
;
run;
The table structure I want should be as following . Thanks&Regards.
What is the logic here?
@lixuan wrote:
Hi , I have a new question here. I have a database like this. For example, I want the variable yymm which is date format to be 3 months before the variable yymm_e, but you can see some data missed, so I wanna complete my lost data.
data have; informat company $15. roe 4.3 yymm yymm_e yymmn6.; format yymm yymm_e yymmn6.; input company$ roe yymm yymm_e ; cards; a 0.05 198601 198603 a 0.03 198512 198603 a 0.3 198611 198703 b 0.4 198812 198902 ; run;
The table structure I want should be as following . Thanks&Regards.
What do you mean by complete lost data?
For Winsorization, check out this article
http://blogs.sas.com/content/iml/2017/02/08/winsorization-good-bad-and-ugly.html
Maybe my expression is not clear. I have revised my question. The data I got wasn't complete, some data were missing, so i want to make it up. Before I do this job, I need to know which data is missing. I am sorry because english is not my mother language. If any questions or suggestions Pls let me know. Thanks a lot.
Hi, I concur with Reeza's suggestion as SAS ETS procs are pretty much built for such applications. However, if you want a datastep solution, I can give it a shot for you. Before that, can you please clarify the logic from 5th row in your wanted output:
I apologise if this is lame, but if you don't mind I would appreciate that so much.
PROC TIMESERIES can be used to fill in missing time periods, as long as you can clearly define them.
I'm not quite sure what your starting/ending interval are but it should work for you.
See a fully worked example here:
https://gist.github.com/statgeek/07a3708dee1225ceb9d4aa75daab2c52
Create a table of all combinations having the company code, 4 previous year quarters and current year-month.
You need a list of the companies, can be created by sql selectint distinct company then do:
proc sql;
create table companies as
select distint company
from have;
quit;
data help;
set companies;
length year $4 yymm $6;
retain year='1988'; /* previous year */
array qtr $ '03' '06' '09' '12';
do i=1 to 4;
yymm = year || qtr(i);
output;
keep company yymm;
run;
data want;
merge help have;
by company yymm;
run;
taht code assumed yymm is a charcter variable.
in case it is a date variable with a desired format, adapt the code to your needs.
Yes, I do want to create a table with full comany name and quarter date, then I can merge with original table to get the result. But you can see the dates are different. As for company a, the first two obervations of yymm_e are both 198603, so there are two yymm missed which are 198506 and 198509. The third observation a whose yymm_e is 198703, and only one observation, so there are three yymm missed which are 198603,198609 and 198612. That's what I want.
In summary, my final goal is to complete the missing data of roe instead of deleting them. Thanks a lot.
@lixuan wrote:
Yes, I do want to create a table with full comany name and quarter date, then I can merge with original table to get the result. But you can see the dates are different. As for company a, the first two obervations of yymm_e are both 198603, so there are two yymm missed which are 198506 and 198509. The third observation a whose yymm_e is 198703, and only one observation, so there are three yymm missed which are 198603,198609 and 198612. That's what I want.
In summary, my final goal is to complete the missing data of roe instead of deleting them. Thanks a lot.
How is the end date identified or known?
The end date is four quarters before yymm_e.
I tried and then realized I still don't understand your logic...so here's a start. I found the smallest and largest end and then used INTCK to determine the number of periods needed. Then looped through to create a quarter record for each of the starting points using INTNX.
Your last record, b, starts from a year before but that doesn't apply to the first record so I don't know why it's inconsistent. Anyways, build the skeleton table with all your periods and then merge it into your original dataset was the approach I was trying to take.
Documentation for INTCK/INTNX
data have;
informat company $15. roe 4.3 yymm yymm_e yymmn6.;
format yymm yymm_e yymmn6.;
input company$ roe yymm yymm_e;
cards;
a 0.05 198503 198603
a 0.03 198512 198603
a 0.3 198606 198703
b 0.4 198812 198902
;
run;
proc sql ;
create table skeleton as select company, min(yymm) as start, max(yymm_e) as
end from have group by company;
quit;
data skeleton_all;
set skeleton;
num_periods=intck('qtr', start, end);
do i=0 to num_periods;
date=intnx('qtr', start, i, 'e');
format date yymmn6.;
output;
end;
format start end yymmn6.;
run;
Thanks, sorry, I didnt expalin my logic very clearly. The blank table I want like this . yymm is in the four quarters before the yymm_e, so the loop end should be 4. I tried but failed.
data want;
set have;
do i=1 to 4;
date=intnx('qtr',yymm_e,-i,'e');
format date yymmn6.;
end;
run;
The logic is still unclear. I think this is one of those times you really need to sit down and draw out the full logic. Or hopefully someone else will understand.
OK, thank you very much.
Maybe it will use array to finish it , and I am not familar with the usage of array.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.