DATA Step, Macro, Functions and more

The subject of date format

Reply
Frequent Contributor
Posts: 97

The subject of date format

Hi, I have problem with date format. Here is my dataset and my code. I dont think I made any mistake, and I cant find the cause. Pls help me , thanks.

data WORK.ONE;
  infile datalines dsd truncover;
  input CUSIP:$8. g:32. yymm_a:YYMMN6.;
  format yymm_a YYMMN6.;
  label CUSIP="CUSIP/SEDOL" g="g";
datalines4;
00036020,22.25,199801
00036020,17,199801
00036020,12,199801
00036020,15,199801
00036020,17,199801
00036020,15,199801
00036020,17,199801
00036020,12,199801
00036020,15,199801
00036110,19.6,199705
00036110,19.6,199810
00036110,11,199705
00036110,11,199705
00036110,16.75,199810
00036110,11,199705
00036110,16.8,199705
00036110,16.8,199810
00036110,16.6,200009
00036110,16.6,199810
00036110,15,200009
;;;;
data WORK.TWO;
  infile datalines dsd truncover;
  input CUSIP:$8. g:32. yymm_e:YYMMN6.;
  format yymm_e YYMMN6.;
  label CUSIP="CUSIP/SEDOL" g="g";
datalines4;
00036020,25,199201
00036020,25,199201
00036020,25,199201
00036020,25,199201
00036110,16.75,199706
00036110,16.6,199706
00036110,15,199706
00036110,15,199810
00036110,12.5,201004
00036110,12.5,200803
00036110,17.5,200803
00036110,17.5,200701
00036110,15.7,201004
00036110,12,201004
00036110,13.4,200009
00036110,13.4,199810
00036110,14.38,200803
00036110,14.38,200701
00036110,15,200803
00036110,15,200701
;;;;

proc sql;
create table three as select a.*,b.g as g1,b.yymm_e
from one a,
	 two b
where a.cusip=b.cusip and
	 b.yymm_e<= intnx('month',yymm_a,12,'same');
quit;
Super User
Posts: 19,789

Re: The subject of date format

What's the issue?

Frequent Contributor
Posts: 97

Re: The subject of date format

Hi Reeza, my issue is about date format. I have changed my date to date format, but I can't get the result I want 

Super User
Posts: 19,789

Re: The subject of date format

Step back for a second. 

What does this information (below) tell me? The code works, so this is a logic issue. But you've only shown the code logic, which works so we can't know what you want. 

 

Hi, I have problem with date format. Here is my dataset and my code. I dont think I made any mistake, and I cant find the cause. Pls help me , thanks.

 

Similarly with the following:

 

The results are strange

 

WIth only the information you've posted there, what response are you looking for?

Frequent Contributor
Posts: 97

Re: The subject of date format

Reeza, I want to merge the two databases. but the only same variable is cusip, and I am sure that yymm_e is no more than 12m after yymm_a, so I do this.

Super User
Posts: 19,789

Re: The subject of date format


lixuan wrote:

 I am sure that yymm_e is no more than 12m after yymm_a


But you've only stated that it needs to be less than a date value. 

 

 b.yymm_e<= intnx('month',yymm_a,12,'same');

You'll join to any record that's less then the 12 months of the variable, which will end up with a large data set. What do you expect as output?

 

Unfortunately I need to log out shortly for a few days (vacation!!!) so you should spend some time building an example that shows what you have and what you need. Using my 'mind reading machine', since you still haven't explained your problem, are you trying to merge the data with the closest date in the data set or with multiple records that are within a specified range? 

Frequent Contributor
Posts: 97

Re: The subject of date format

Hi Reeza, I will check my logic again, thank you and have a good vacation.

Frequent Contributor
Posts: 97

Re: The subject of date format

The result is very strang, and the same data I used before in the intnx function. The result was  right.

SAS Output



Obs CUSIP g yymm_a g1 yymm_e 1 2 3 4 5 6 7 8 9 10 11 12 13
0003602022.2519980125.00199201
0003602022.2519980125.00199201
0003602022.2519980125.00199201
0003602022.2519980125.00199201
0003602017.0019980125.00199201
0003602017.0019980125.00199201
0003602017.0019980125.00199201
0003602017.0019980125.00199201
0003602012.0019980125.00199201
0003602012.0019980125.00199201
0003602012.0019980125.00199201
0003602012.0019980125.00199201
0003602015.0019980125.00199201



 
Super User
Posts: 11,343

Re: The subject of date format

Since your data set two has the first four records identical then anything that matches the first will match all 4. Similarly all the other duplicates is two will generate additional duplicate records.

 

instead of Intnx perhaps you want to look at using the the two dates in an INTCK function call and see if the result is between 0 and 12?

without an example of the desired output I'm not sure of something more specific

Trusted Advisor
Posts: 1,558

Re: The subject of date format

@lixuan, try and change your last step to:

 

proc sql;
create table three as select a.*,b.g as g1,b.yymm_e
from one a,
	 two b
where a.cusip=b.cusip and
	 abs(intck('month',yymm_a,yymm_e))  <= 12;
quit;
Frequent Contributor
Posts: 97

Re: The subject of date format

Sure, I understand that my problem is not the matter of date format, and I use the code to get the result. But there are too many duplicate observations producted by the rule of proc sql.  I only have variables cusip and yymm_  which have some connections . I will be very appreciated if any suggestion. Thanks

a.yymm_a<= b.yymm_e<= intnx('month',a.yymm_a,12,'same');
Trusted Advisor
Posts: 1,558

Re: The subject of date format

[ Edited ]

You did not marked is my previous posted code give the wanted results, and if not - what is wrong with it ?!

 

You ave not posted desired or expected results in order to check suggested code. Please do it - 

otherwise we are just guessing.

 

One more guess - try next code:

data want;
  merge one two;
  by;  /* concatenates datasets observations horizontally one by one */
if yymm_a <= yymm_e <= intnx('month',yymm_a,12,'same');
run;

Super User
Posts: 7,771

Re: The subject of date format

There are 9 observations from one and 4 observations from two that mutually fulfill the condition of being within 12 months, so you get 9*4 = 36 observations in the output. That's what an outer join in SQL does.

 

So you need to refine your rule. Start by formulating it in plain language, then exercise your rule logically, and once it works, convert it to code.

Or you show the intended result you expected (from the tow datasets as posted in your OP), so we can work on that.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 97

Re: The subject of date format

Posted in reply to KurtBremser

Hi, Thank you all for your patience. Now the problem has become the matter of merge which had also puzzled me for a long time. The following is my example an the result I  want. My logic is very simple: I just wanna the company and date are same in both  datasets, and roe, roa are as they were. I always find it's hard to control the result in 'many-many' merge. thanks& regards.

data one;
input company$ roe date;
cards;
A 0.5 1995
A 0.2 1994
B 0.6 1998
;
data two;
input company$ roa date;
cards;
A 0.2 1995
A 0.3 1996
A 0.8 1995
B 0.7 1998
B 0.5 1998
C 0.2 1995
;
*result; data want; input company$ roe roa date; cards; A 0.5 0.2 1995 A . 0.8 1995 B 0.6 0.7 1998 B . 0.5 1998 ;
Super User
Posts: 7,771

Re: The subject of date format


lixuan wrote:

Hi, Thank you all for your patience. Now the problem has become the matter of merge which had also puzzled me for a long time. The following is my example an the result I  want. My logic is very simple: I just wanna the company and date are same in both  datasets, and roe, roa are as they were. I always find it's hard to control the result in 'many-many' merge. thanks& regards.

data one;
input company$ roe date;
cards;
A 0.5 1995
A 0.2 1994
B 0.6 1998
;
data two;
input company$ roa date;
cards;
A 0.2 1995
A 0.3 1996
A 0.8 1995
B 0.7 1998
B 0.5 1998
C 0.2 1995
;
*result; data want; input company$ roe roa date; cards; A 0.5 0.2 1995 A . 0.8 1995 B 0.6 0.7 1998 B . 0.5 1998 ;

This can be done with a little trickery in a data step:

proc sort data=one;
by company date;
run;

proc sort data=two;
by company date;
run;

data want;
format company roe roa date; * strictly for order of variables;
roe = .;
roa = .; * these two statements override the automatic retain of variables in datasets;
merge
  one (in=a)
  two (in=b)
;
by company date;
if a and b;
run;

proc print data=want noobs;
run;

Result:

company    roe    roa    date

   A       0.5    0.2    1995
   A        .     0.8    1995
   B       0.6    0.7    1998
   B        .     0.5    1998
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Ask a Question
Discussion stats
  • 21 replies
  • 298 views
  • 2 likes
  • 5 in conversation