Hi, I have problem with date format. Here is my dataset and my code. I dont think I made any mistake, and I cant find the cause. Pls help me , thanks.
data WORK.ONE;
infile datalines dsd truncover;
input CUSIP:$8. g:32. yymm_a:YYMMN6.;
format yymm_a YYMMN6.;
label CUSIP="CUSIP/SEDOL" g="g";
datalines4;
00036020,22.25,199801
00036020,17,199801
00036020,12,199801
00036020,15,199801
00036020,17,199801
00036020,15,199801
00036020,17,199801
00036020,12,199801
00036020,15,199801
00036110,19.6,199705
00036110,19.6,199810
00036110,11,199705
00036110,11,199705
00036110,16.75,199810
00036110,11,199705
00036110,16.8,199705
00036110,16.8,199810
00036110,16.6,200009
00036110,16.6,199810
00036110,15,200009
;;;;
data WORK.TWO;
infile datalines dsd truncover;
input CUSIP:$8. g:32. yymm_e:YYMMN6.;
format yymm_e YYMMN6.;
label CUSIP="CUSIP/SEDOL" g="g";
datalines4;
00036020,25,199201
00036020,25,199201
00036020,25,199201
00036020,25,199201
00036110,16.75,199706
00036110,16.6,199706
00036110,15,199706
00036110,15,199810
00036110,12.5,201004
00036110,12.5,200803
00036110,17.5,200803
00036110,17.5,200701
00036110,15.7,201004
00036110,12,201004
00036110,13.4,200009
00036110,13.4,199810
00036110,14.38,200803
00036110,14.38,200701
00036110,15,200803
00036110,15,200701
;;;;
proc sql;
create table three as select a.*,b.g as g1,b.yymm_e
from one a,
two b
where a.cusip=b.cusip and
b.yymm_e<= intnx('month',yymm_a,12,'same');
quit;
What's the issue?
Hi Reeza, my issue is about date format. I have changed my date to date format, but I can't get the result I want
Step back for a second.
What does this information (below) tell me? The code works, so this is a logic issue. But you've only shown the code logic, which works so we can't know what you want.
Hi, I have problem with date format. Here is my dataset and my code. I dont think I made any mistake, and I cant find the cause. Pls help me , thanks.
Similarly with the following:
The results are strange
WIth only the information you've posted there, what response are you looking for?
Reeza, I want to merge the two databases. but the only same variable is cusip, and I am sure that yymm_e is no more than 12m after yymm_a, so I do this.
@lixuan wrote:
I am sure that yymm_e is no more than 12m after yymm_a
But you've only stated that it needs to be less than a date value.
b.yymm_e<= intnx('month',yymm_a,12,'same');
You'll join to any record that's less then the 12 months of the variable, which will end up with a large data set. What do you expect as output?
Unfortunately I need to log out shortly for a few days (vacation!!!) so you should spend some time building an example that shows what you have and what you need. Using my 'mind reading machine', since you still haven't explained your problem, are you trying to merge the data with the closest date in the data set or with multiple records that are within a specified range?
Hi Reeza, I will check my logic again, thank you and have a good vacation.
The result is very strang, and the same data I used before in the intnx function. The result was right.
SAS Output
00036020 | 22.25 | 199801 | 25.00 | 199201 |
00036020 | 22.25 | 199801 | 25.00 | 199201 |
00036020 | 22.25 | 199801 | 25.00 | 199201 |
00036020 | 22.25 | 199801 | 25.00 | 199201 |
00036020 | 17.00 | 199801 | 25.00 | 199201 |
00036020 | 17.00 | 199801 | 25.00 | 199201 |
00036020 | 17.00 | 199801 | 25.00 | 199201 |
00036020 | 17.00 | 199801 | 25.00 | 199201 |
00036020 | 12.00 | 199801 | 25.00 | 199201 |
00036020 | 12.00 | 199801 | 25.00 | 199201 |
00036020 | 12.00 | 199801 | 25.00 | 199201 |
00036020 | 12.00 | 199801 | 25.00 | 199201 |
00036020 | 15.00 | 199801 | 25.00 | 199201 |
Since your data set two has the first four records identical then anything that matches the first will match all 4. Similarly all the other duplicates is two will generate additional duplicate records.
instead of Intnx perhaps you want to look at using the the two dates in an INTCK function call and see if the result is between 0 and 12?
without an example of the desired output I'm not sure of something more specific
@lixuan, try and change your last step to:
proc sql;
create table three as select a.*,b.g as g1,b.yymm_e
from one a,
two b
where a.cusip=b.cusip and
abs(intck('month',yymm_a,yymm_e)) <= 12;
quit;
Sure, I understand that my problem is not the matter of date format, and I use the code to get the result. But there are too many duplicate observations producted by the rule of proc sql. I only have variables cusip and yymm_ which have some connections . I will be very appreciated if any suggestion. Thanks
a.yymm_a<= b.yymm_e<= intnx('month',a.yymm_a,12,'same');
You did not marked is my previous posted code give the wanted results, and if not - what is wrong with it ?!
You ave not posted desired or expected results in order to check suggested code. Please do it -
otherwise we are just guessing.
One more guess - try next code:
data want; merge one two; by; /* concatenates datasets observations horizontally one by one */
if yymm_a <= yymm_e <= intnx('month',yymm_a,12,'same');
run;
h
There are 9 observations from one and 4 observations from two that mutually fulfill the condition of being within 12 months, so you get 9*4 = 36 observations in the output. That's what an outer join in SQL does.
So you need to refine your rule. Start by formulating it in plain language, then exercise your rule logically, and once it works, convert it to code.
Or you show the intended result you expected (from the tow datasets as posted in your OP), so we can work on that.
Hi, Thank you all for your patience. Now the problem has become the matter of merge which had also puzzled me for a long time. The following is my example an the result I want. My logic is very simple: I just wanna the company and date are same in both datasets, and roe, roa are as they were. I always find it's hard to control the result in 'many-many' merge. thanks& regards.
data one;
input company$ roe date;
cards;
A 0.5 1995
A 0.2 1994
B 0.6 1998
;
data two;
input company$ roa date;
cards;
A 0.2 1995
A 0.3 1996
A 0.8 1995
B 0.7 1998
B 0.5 1998
C 0.2 1995
;
*result;
data want;
input company$ roe roa date;
cards;
A 0.5 0.2 1995
A . 0.8 1995
B 0.6 0.7 1998
B . 0.5 1998
;
@lixuan wrote:
Hi, Thank you all for your patience. Now the problem has become the matter of merge which had also puzzled me for a long time. The following is my example an the result I want. My logic is very simple: I just wanna the company and date are same in both datasets, and roe, roa are as they were. I always find it's hard to control the result in 'many-many' merge. thanks& regards.
data one; input company$ roe date; cards; A 0.5 1995 A 0.2 1994 B 0.6 1998 ; data two; input company$ roa date; cards; A 0.2 1995 A 0.3 1996 A 0.8 1995 B 0.7 1998 B 0.5 1998 C 0.2 1995 ;
*result; data want; input company$ roe roa date; cards; A 0.5 0.2 1995 A . 0.8 1995 B 0.6 0.7 1998 B . 0.5 1998 ;
This can be done with a little trickery in a data step:
proc sort data=one;
by company date;
run;
proc sort data=two;
by company date;
run;
data want;
format company roe roa date; * strictly for order of variables;
roe = .;
roa = .; * these two statements override the automatic retain of variables in datasets;
merge
one (in=a)
two (in=b)
;
by company date;
if a and b;
run;
proc print data=want noobs;
run;
Result:
company roe roa date A 0.5 0.2 1995 A . 0.8 1995 B 0.6 0.7 1998 B . 0.5 1998
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.