I would like to check the independence of directors based on the work experience data of all directors.
Since this process involves comparison and merging of data during the working period of two data sets,
I would like to ask the experts how to solve this problem?
data experience;
input PERSON $ COM_ID START END IND $;
cards;
A 001 20050101 20121231 YES
A 003 20070601 20241031 NO
B 009 20140503 20220429 YES
B 010 20180820 20221225 YES
;
proc print;
run;
data have;
input PERSON $ COM_ID ANN_DATE ;
cards;
A 003 20071231
B 009 20151231
;
proc print;
run;
The form of data I want to present is listed below.
Thank you for your help.
data want;
input PERSON $ COM_ID ANN_DATE IND $;
cards;
A 003 20071231 NO
B 009 20151231 YES
;
proc print;
run;
For reading a date string into a numerical SAS variable holding a SAS Date value you need to instruct SAS via a informat how to read the string (so it becomes the count of days since 1/1/1960). Only then will a date format display the correct date.
Below returns what you've asked for.
data experience;
input PERSON $ COMPANY_ID (START_DATE END_DATE) (:yymmdd8.) ind $;
format START_DATE END_DATE yymmddn8. ;
cards;
A 101 20050101 20121231 YES
A 103 20070601 20241031 NO
B 109 20140503 20220429 YES
B 110 20180820 20221225 YES
;
run;
data have;
input PERSON $ COMPANY_ID ANNOUNCE_DATE :yymmdd8.;
format ANNOUNCE_DATE yymmddn8. ; /*The company publicly announces that the director is serving the company*/
cards;
A 103 20071231
B 109 20151231
;
run;
data want;
input PERSON $ COMPANY_ID ANNOUNCE_DATE IND $;
format ANNOUNCE_DATE yymmddn8.;
cards;
A 103 20071231 NO
B 109 20151231 YES
;
proc sql;
/* create table want2 as*/
select l.*, r.ind
from have l
left join
experience r
on l.person=r.person
and l.company_id=r.company_id
and l.announce_date between r.start_date and r.end_date
;
quit;
Please explain what your variables represent. And explain the rules you used to decide that is the correct output for that input. Once you know the rules you can begin trying to write a program to implement them.
In general if you want to compare dates it works much better when you have actual dates. Are your date values actual dates? Or just the strange integers in YYY,YM,MDD style that your data steps will create?
Hi expert,
I would like to use existing experience data and identify whether the director is working in the company and whether he is independent.
Those numbers are dates, so I changed the format of the template.
data experience;
input PERSON $ COMPANY_ID START_DATE END_DATE IND $;
format START_DATE yymmddn8. ; /*The date on which the director started working for the company*/
format END_DATE yymmddn8. ; /*The date on which the director ended working for the company*/
cards;
A 101 20050101 20121231 YES
A 103 20070601 20241031 NO
B 109 20140503 20220429 YES
B 110 20180820 20221225 YES
;
run;
data have;
input PERSON $ COMPANY_ID ANNOUNCE_DATE ;
format ANNOUNCE_DATE yymmddn8. ; /*The company publicly announces that the director is serving the company*/
cards;
A 103 20071231
B 109 20151231
;
run;
data want;
input PERSON $ COMPANY_ID ANNOUNCE_DATE IND $;
format ANNOUNCE_DATE yymmddn8. ;
cards;
A 103 20071231 NO
B 109 20151231 YES
;
run;
/*If the date announced by the company is between the start and end dates of work experience,
the director’s independence can be identified*/
For reading a date string into a numerical SAS variable holding a SAS Date value you need to instruct SAS via a informat how to read the string (so it becomes the count of days since 1/1/1960). Only then will a date format display the correct date.
Below returns what you've asked for.
data experience;
input PERSON $ COMPANY_ID (START_DATE END_DATE) (:yymmdd8.) ind $;
format START_DATE END_DATE yymmddn8. ;
cards;
A 101 20050101 20121231 YES
A 103 20070601 20241031 NO
B 109 20140503 20220429 YES
B 110 20180820 20221225 YES
;
run;
data have;
input PERSON $ COMPANY_ID ANNOUNCE_DATE :yymmdd8.;
format ANNOUNCE_DATE yymmddn8. ; /*The company publicly announces that the director is serving the company*/
cards;
A 103 20071231
B 109 20151231
;
run;
data want;
input PERSON $ COMPANY_ID ANNOUNCE_DATE IND $;
format ANNOUNCE_DATE yymmddn8.;
cards;
A 103 20071231 NO
B 109 20151231 YES
;
proc sql;
/* create table want2 as*/
select l.*, r.ind
from have l
left join
experience r
on l.person=r.person
and l.company_id=r.company_id
and l.announce_date between r.start_date and r.end_date
;
quit;
Sorry to bother you again.
In the data I downloaded, if the directors have been in the current company up to now, the date column shows C representing "current".
data experience;
input PERSON $ COMPANY_ID (START_DATE END_DATE) (:yymmdd8.) ind $;
format START_DATE END_DATE yymmddn8. ;
cards;
A 101 20050101 20121231 YES
A 103 20070601 20241031 NO
B 109 20140503 C YES
B 110 20180820 20221225 YES
;
run;
This causes the following syntax to fail to execute.
Do you know of any solution?
l.announce_date between r.start_date and r.end_date
Just change the condition to account for this new criteria.
l.announce_date ge r.start_date
and (missing(l.announce_date) or l.announce_date le r.end_date)
Just amend the selection criteria. Because we read this column with informat yymmdd10. a source character C will result in a missing value in target variable end_date.
proc sql;
/* create table want2 as*/
select l.*, r.ind
from have l
left join
experience r
on l.person=r.person
and l.company_id=r.company_id
/* and l.announce_date between r.start_date and r.end_date*/
and l.announce_date >= r.start_date and ( l.announce_date <=r.end_date or missing(r.end_date) )
;
quit;
An alternative approach would be to not change the SQL selection but how you create your source table. If the source for end_date is C then set the date to something far in the future.
data experience;
input PERSON $ COMPANY_ID START_DATE :yymmdd8. c_END_DATE :$upcase8. ind $;
if c_END_DATE='C' then end_date='01jan5999'd;
else end_date=input(c_END_DATE,yymmdd8.);
format START_DATE END_DATE yymmddn8. ;
drop c_END_DATE;
cards;
A 101 20050101 20121231 YES
A 103 20070601 20241031 NO
B 109 20140503 C YES
B 110 20180820 20221225 YES
;
run;
...or here another way how to read such data.
proc format;
invalue end_date(upcase)
'C' = '01jan5999'd
other=[yymmdd8.]
;
quit;
data experience;
input PERSON $ COMPANY_ID START_DATE :yymmdd8. END_DATE :end_date. ind $;
format START_DATE END_DATE yymmddn8. ;
cards;
A 101 20050101 20121231 YES
A 103 20070601 20241031 NO
B 109 20140503 C YES
B 110 20180820 20221225 YES
;
run;
@Patrick wrote:
Just amend the selection criteria. Because we read this column with informat yymmdd10. a source character C will result in a missing value in target variable end_date.
That depend on whether you included the letter C in your MISSING statement or not. If you have then the string 'C' will be converted to the special missing value .C instead of the regular missing value.
1 missing ; 2 data test1; 3 input end_date :yymmdd. ; 4 format end_date yymmdd10.; 5 put end_date= ; 6 cards; end_date=2024-04-12 NOTE: Invalid data for end_date in line 8 1-1. end_date=. RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0 8 A end_date=. _ERROR_=1 _N_=2 NOTE: Invalid data for end_date in line 9 1-1. end_date=. 9 B end_date=. _ERROR_=1 _N_=3 NOTE: Invalid data for end_date in line 10 1-1. end_date=. 10 C end_date=. _ERROR_=1 _N_=4 NOTE: The data set WORK.TEST1 has 4 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 11 ; 12 13 missing ABC; 14 data test2; 15 input end_date :yymmdd. ; 16 format end_date yymmdd10.; 17 put end_date= ; 18 cards; end_date=2024-04-12 end_date=A end_date=B end_date=C NOTE: The data set WORK.TEST2 has 4 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 23 ;
First, I would like to thank the two experts (@Patrick and @Tom) for your assistance.
I have a question that I would like to ask you further.
The director's start and end information I obtained from the database shows:
C = Current: C means that the director is still in office until now (which is informative). Although it is displayed in text, the format is still a date (as shown in the picture).
N = n/a: N means missing values (which have no informational meaning).
I think a more appropriate way is to turn C into a date today or a long time in the future, and treat N as a missing processing method.
Do you know how to adjust the program? Thanks again.
Based on your screenshots your variables dateStartRole and dateEndRole are of type numeric meaning that they can only store numbers. The format attached to it then determines how these numbers get displayed.
Format yymmddD10. is SAS supplied and only returns dates but not N or C. These N or C are displaying Special Missings that likely got created when you read the external data from a text file into a SAS table (that's what @Tom already mentioned in his last post).
I believe easiest would be if you deal with these C and N in your text source already when you read it into SAS.
Below one way how to do this.
- C in source gets set to a SAS date value for 01 January 5999
- Any source string that's in the format yyyymmdd and though a valid input for informat yymmdd8. will become a SAS date value (count of days since 1/1/1960)
- Any other source string will become a missing (a normal one and not a special one).
proc format;
invalue read_date(upcase)
'C' = '01jan5999'd
low-high=[yymmdd8.]
other =.
;
quit;
data experience;
input PERSON $ COMPANY_ID START_DATE :read_date8. END_DATE :read_date8. ind $;
format START_DATE END_DATE yymmddd10. ;
cards;
A 101 20050101 20121231 YES
A 103 20070601 20241031 NO
B 109 20140503 C YES
B 110 20180820 20221225 YES
;
run;
data have;
input PERSON $ COMPANY_ID ANNOUNCE_DATE :yymmdd8.;
format ANNOUNCE_DATE yymmddn8. ; /*The company publicly announces that the director is serving the company*/
cards;
A 103 20071231
B 109 20151231
;
run;
proc sql;
/* create table want2 as*/
select l.*, r.ind
from have l
left join
experience r
on l.person=r.person
and l.company_id=r.company_id
and not missing(r.start_date)
and l.announce_date between r.start_date and r.end_date
;
quit;
Your screenshots also indicate that the start date could be missing (source string N). Condition and not missing(r.start_date)
will exclude such rows.
Big thanks to the two experts @Tom and @Patrick .
I may not be completely clear about the format of SAS,
but I generally understand that what the experts mean is to create a new format, and then apply this format when the data is newly created,
so that C becomes a future date.
However, my dataset is downloaded from the database and imported into SAS in the sas7bdat file format.
I have just tried for a long time and it seems that there is no way to convert C into a future date.
Maybe I'll just use missing to handle it.
I feel like two experts did their best, thank you very much anyway.
Hello @shawnchen0321,
@shawnchen0321 wrote:
You are lucky that you start already with a SAS dataset (i.e., a .sas7bdat file). Then you don't need the READ_DATE. informat to create variable DATEENDROLE2:
data employ_c; set employ; if DATEENDROLE=.C then DATEENDROLE2='01jan5999'd; else DATEENDROLE2=DATEENDROLE; format DATEENDROLE2 yymmdd10.; keep DATESTARTROLE DATEENDROLE:; run;
As has been pointed out earlier, DATEENDROLE is a numeric variable and those Cs and Ns are special missing values, which must be referred to as .C and .N (or .c and .n), respectively, in SAS code such as your IF-THEN statement to distinguish them from character strings like "C" or 'N' and variable names like C or N. I have also added a FORMAT statement to make the newly created values human readable as dates. (Note that even an unconditional assignment DATEENDROLE2=DATEENDROLE would not transfer the format of DATEENDROLE to DATEENDROLE2.)
Hello @FreelanceReinh,
It worked, thank you.
I learned a lot from you,
thank you experts @Tom @Patrick @FreelanceReinh !
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.