BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
shawnchen0321
Obsidian | Level 7

I would like to check the independence of directors based on the work experience data of all directors.

 

Since this process involves comparison and merging of data during the working period of two data sets,

I would like to ask the experts how to solve this problem?

 

data experience;
input PERSON $ 	COM_ID  START END IND $;
cards;
A	001 20050101	20121231	 YES
A	003 20070601	20241031	 NO
B	009 20140503	20220429	 YES
B	010 20180820	20221225	 YES
;
proc print;
run;

data have;
input PERSON $ 	COM_ID  ANN_DATE ;
cards;
A 003 20071231
B 009 20151231
;
proc print;
run;

 

The form of data I want to present is listed below.

Thank you for your help.

data want;
input PERSON $ 	COM_ID  ANN_DATE IND $;
cards;
A 003 20071231 NO
B 009 20151231 YES
;
proc print;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

For reading a date string into a numerical SAS variable holding a SAS Date value you need to instruct SAS via a informat how to read the string (so it becomes the count of days since 1/1/1960). Only then will a date format display the correct date.

 

Below returns what you've asked for.

data experience;
  input PERSON $ COMPANY_ID  (START_DATE  END_DATE) (:yymmdd8.) ind $;
  format START_DATE END_DATE yymmddn8. ;
  cards;
A 101 20050101 20121231 YES
A 103 20070601 20241031 NO
B 109 20140503 20220429 YES
B 110 20180820 20221225 YES
;
run;

data have;
  input PERSON $ COMPANY_ID  ANNOUNCE_DATE :yymmdd8.;
  format ANNOUNCE_DATE yymmddn8. ; /*The company publicly announces that the director is serving the company*/
  cards;
A 103 20071231
B 109 20151231
;
run;

data want;
  input PERSON $ COMPANY_ID  ANNOUNCE_DATE IND $;
  format ANNOUNCE_DATE yymmddn8.;
  cards;
A 103 20071231 NO
B 109 20151231 YES
;

proc sql;
/*  create table want2 as*/
  select l.*, r.ind
  from have l
  left join
  experience r
  on l.person=r.person 
    and l.company_id=r.company_id 
    and l.announce_date between r.start_date and r.end_date
  ;
quit;

 

View solution in original post

19 REPLIES 19
Tom
Super User Tom
Super User

Please explain what your variables represent. And explain the rules you used to decide that is the correct output for that input.  Once you know the rules you can begin trying to write a program to implement them.

 

In general if you want to compare dates it works much better when you have actual dates. Are your date values actual dates? Or just the strange integers in YYY,YM,MDD style that your data steps will create? 

shawnchen0321
Obsidian | Level 7
Understood, I will add and adjust it immediately.
shawnchen0321
Obsidian | Level 7

Hi expert,

I would like to use existing experience data and identify whether the director is working in the company and whether he is independent.

Those numbers are dates, so I changed the format of the template.

 

data experience;
input PERSON $ COMPANY_ID  START_DATE  END_DATE IND $;
format START_DATE yymmddn8. ;  /*The date on which the director started working for the company*/
format END_DATE yymmddn8. ; /*The date on which the director ended working for the company*/
cards;
A	101 20050101  20121231  YES
A	103 20070601  20241031  NO
B	109 20140503  20220429  YES
B 110 20180820  20221225  YES
;
run;

data have;
input PERSON $ COMPANY_ID  ANNOUNCE_DATE ;
format ANNOUNCE_DATE yymmddn8. ; /*The company publicly announces that the director is serving the company*/
cards;
A 103 20071231
B 109 20151231
;
run;

data want; input PERSON $ COMPANY_ID ANNOUNCE_DATE IND $; format ANNOUNCE_DATE yymmddn8. ; cards; A 103 20071231 NO B 109 20151231 YES ; run;
/*If the date announced by the company is between the start and end dates of work experience,
the director’s independence can be identified*/

 

Patrick
Opal | Level 21

For reading a date string into a numerical SAS variable holding a SAS Date value you need to instruct SAS via a informat how to read the string (so it becomes the count of days since 1/1/1960). Only then will a date format display the correct date.

 

Below returns what you've asked for.

data experience;
  input PERSON $ COMPANY_ID  (START_DATE  END_DATE) (:yymmdd8.) ind $;
  format START_DATE END_DATE yymmddn8. ;
  cards;
A 101 20050101 20121231 YES
A 103 20070601 20241031 NO
B 109 20140503 20220429 YES
B 110 20180820 20221225 YES
;
run;

data have;
  input PERSON $ COMPANY_ID  ANNOUNCE_DATE :yymmdd8.;
  format ANNOUNCE_DATE yymmddn8. ; /*The company publicly announces that the director is serving the company*/
  cards;
A 103 20071231
B 109 20151231
;
run;

data want;
  input PERSON $ COMPANY_ID  ANNOUNCE_DATE IND $;
  format ANNOUNCE_DATE yymmddn8.;
  cards;
A 103 20071231 NO
B 109 20151231 YES
;

proc sql;
/*  create table want2 as*/
  select l.*, r.ind
  from have l
  left join
  experience r
  on l.person=r.person 
    and l.company_id=r.company_id 
    and l.announce_date between r.start_date and r.end_date
  ;
quit;

 

shawnchen0321
Obsidian | Level 7

Sorry to bother you again.

 

In the data I downloaded, if the directors have been in the current company up to now, the date column shows C representing "current".

 

data experience;
  input PERSON $ COMPANY_ID  (START_DATE  END_DATE) (:yymmdd8.) ind $;
  format START_DATE END_DATE yymmddn8. ;
  cards;
A 101 20050101 20121231 YES
A 103 20070601 20241031 NO
B 109 20140503 C YES
B 110 20180820 20221225 YES
;
run;

 

Screenshot 2024-04-13 160029.png

 

This causes the following syntax to fail to execute.

Do you know of any solution?

 

l.announce_date between r.start_date and r.end_date

 

 

Tom
Super User Tom
Super User

Just change the condition to account for this new criteria.

l.announce_date ge r.start_date
 and (missing(l.announce_date) or l.announce_date le r.end_date)
Patrick
Opal | Level 21

Just amend the selection criteria. Because we read this column with informat yymmdd10. a source character C will result in a missing value in target variable end_date.

proc sql;
/*  create table want2 as*/
  select l.*, r.ind
  from have l
  left join
  experience r
  on l.person=r.person 
    and l.company_id=r.company_id 
/*    and l.announce_date between r.start_date and r.end_date*/
    and l.announce_date >= r.start_date and ( l.announce_date <=r.end_date or missing(r.end_date) )
  ;
quit;

An alternative approach would be to not change the SQL selection but how you create your source table. If the source for end_date is C then set the date to something far in the future.

data experience;
  input PERSON $ COMPANY_ID  START_DATE :yymmdd8.  c_END_DATE :$upcase8. ind $;
  if c_END_DATE='C' then end_date='01jan5999'd;
  else end_date=input(c_END_DATE,yymmdd8.);
  format START_DATE END_DATE yymmddn8. ;
  drop c_END_DATE;
  cards;
A 101 20050101 20121231 YES
A 103 20070601 20241031 NO
B 109 20140503 C YES
B 110 20180820 20221225 YES
;
run;

...or here another way how to read such data.

proc format;
  invalue end_date(upcase)
    'C' = '01jan5999'd
    other=[yymmdd8.]
    ;
quit;

data experience;
  input PERSON $ COMPANY_ID START_DATE  :yymmdd8. END_DATE  :end_date. ind $;
  format START_DATE END_DATE yymmddn8. ;
  cards;
A 101 20050101 20121231 YES
A 103 20070601 20241031 NO
B 109 20140503 C YES
B 110 20180820 20221225 YES
;
run;

 

Tom
Super User Tom
Super User

@Patrick wrote:

Just amend the selection criteria. Because we read this column with informat yymmdd10. a source character C will result in a missing value in target variable end_date.

 


That depend on whether you included the letter C in your MISSING statement or not.  If you have then the string 'C' will be converted to the special missing value .C instead of the regular missing value.

1    missing ;
2    data test1;
3      input end_date :yymmdd. ;
4      format end_date yymmdd10.;
5      put end_date= ;
6    cards;

end_date=2024-04-12
NOTE: Invalid data for end_date in line 8 1-1.
end_date=.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
8          A
end_date=. _ERROR_=1 _N_=2
NOTE: Invalid data for end_date in line 9 1-1.
end_date=.
9          B
end_date=. _ERROR_=1 _N_=3
NOTE: Invalid data for end_date in line 10 1-1.
end_date=.
10         C
end_date=. _ERROR_=1 _N_=4
NOTE: The data set WORK.TEST1 has 4 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


11   ;
12
13   missing ABC;
14   data test2;
15     input end_date :yymmdd. ;
16     format end_date yymmdd10.;
17     put end_date= ;
18   cards;

end_date=2024-04-12
end_date=A
end_date=B
end_date=C
NOTE: The data set WORK.TEST2 has 4 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


23   ;
shawnchen0321
Obsidian | Level 7

First, I would like to thank the two experts (@Patrick and @Tom) for your assistance.

 

I have a question that I would like to ask you further.

The director's start and end information I obtained from the database shows: 

 

C = Current: C means that the director is still in office until now (which is informative). Although it is displayed in text, the format is still a date (as shown in the picture).

N = n/a: N means missing values ​​(which have no informational meaning).

 

001.png002.png

 

I think a more appropriate way is to turn C into a date today or a long time in the future, and treat N as a missing processing method.

Do you know how to adjust the program? Thanks again.

Patrick
Opal | Level 21

Based on your screenshots your variables dateStartRole and dateEndRole are of type numeric meaning that they can only store numbers. The format attached to it then determines how these numbers get displayed.

Format yymmddD10. is SAS supplied and only returns dates but not N or C. These N or C are displaying Special Missings that likely got created when you read the external data from a text file into a SAS table (that's what @Tom already mentioned in his last post). 

 

I believe easiest would be if you deal with these C and N in your text source already when you read it into SAS.

Below one way how to do this.

- C in source gets set to a SAS date value for 01 January 5999

- Any source string that's in the format yyyymmdd and though a valid input for informat yymmdd8. will become a SAS date value (count of days since 1/1/1960)

- Any other source string will become a missing (a normal one and not a special one).

proc format;
  invalue read_date(upcase)
    'C'     = '01jan5999'd
    low-high=[yymmdd8.]
    other   =.
    ;
quit;

data experience;
  input PERSON $ COMPANY_ID START_DATE  :read_date8. END_DATE  :read_date8. ind $;
  format START_DATE END_DATE yymmddd10. ;
  cards;
A 101 20050101 20121231 YES
A 103 20070601 20241031 NO
B 109 20140503 C YES
B 110 20180820 20221225 YES
;
run;

data have;
  input PERSON $ COMPANY_ID  ANNOUNCE_DATE :yymmdd8.;
  format ANNOUNCE_DATE yymmddn8. ; /*The company publicly announces that the director is serving the company*/
  cards;
A 103 20071231
B 109 20151231
;
run;

proc sql;
/*  create table want2 as*/
  select l.*, r.ind
  from have l
  left join
  experience r
  on l.person=r.person 
    and l.company_id=r.company_id 
    and not missing(r.start_date)
    and l.announce_date between r.start_date and r.end_date
  ;
quit;

Your screenshots also indicate that the start date could be missing (source string N). Condition and not missing(r.start_date) will exclude such rows. 

 

 

 

shawnchen0321
Obsidian | Level 7

Big thanks to the two experts @Tom and @Patrick .

 

I may not be completely clear about the format of SAS,

but I generally understand that what the experts mean is to create a new format, and then apply this format when the data is newly created,

so that C becomes a future date.

 

However, my dataset is downloaded from the database and imported into SAS in the sas7bdat file format.

I have just tried for a long time and it seems that there is no way to convert C into a future date.

Maybe I'll just use missing to handle it.

 

003.png004.png

 

 

I feel like two experts did their best, thank you very much anyway.

FreelanceReinh
Jade | Level 19

Hello @shawnchen0321,

 


@shawnchen0321 wrote:

003.png004.png


You are lucky that you start already with a SAS dataset (i.e., a .sas7bdat file). Then you don't need the READ_DATE. informat to create variable DATEENDROLE2:

 

data employ_c;
set employ;
if DATEENDROLE=.C then DATEENDROLE2='01jan5999'd;
else DATEENDROLE2=DATEENDROLE;
format DATEENDROLE2 yymmdd10.;
keep DATESTARTROLE DATEENDROLE:;
run;

As has been pointed out earlier, DATEENDROLE is a numeric variable and those Cs and Ns are special missing values, which must be referred to as .C and .N (or .c and .n), respectively, in SAS code such as your IF-THEN statement to distinguish them from character strings like "C" or 'N' and variable names like C or N. I have also added a FORMAT statement to make the newly created values human readable as dates. (Note that even an unconditional assignment DATEENDROLE2=DATEENDROLE would not transfer the format of DATEENDROLE to DATEENDROLE2.)

 

shawnchen0321
Obsidian | Level 7

Hello @FreelanceReinh

It worked, thank you.

 

I learned a lot from you,

thank you experts @Tom @Patrick @FreelanceReinh !

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 19 replies
  • 983 views
  • 4 likes
  • 4 in conversation