Hi everyone,
I am working with personnel records for more than 3 million people over a 30 year period of time. The records are formatted as 1 event per person per row, so some people have as few as 3 or 4 rows, while other people have as many as 120+ rows. There are 108 variables.
SSN is used for BY-group processing, along with EVENT_DATE.
EVENT_CODE is used to differentiate between the different possible events.
I want to create a flag variable for every accession event (EVENT_CODE = 101, 102, 103, 104, and 105). Some people have multiple accession events and they may even have multiple accessions into the same organization.
sample data:
SSN EVENT_DATE EVENT_CODE
123 06/13/2006 105
123 10/06/2013 101
123 01/10/2014 101
456 01/29/2005 102
456 02/02/2006 105
Using the following syntax, records for person "456" are properly coded - because the person has only a single event per EVENT_CODE values '102' and '105'.
But with person "123", only the 1st and 3rd events are properly coded - the 2nd event (or the 1st time EVENT_CODE = 101) gets coded as missing.
data accession_loop;
*pass one through by-group: determine relative location of each accession record;
do obs=1 by 1 until (last.ssn);
set personnel;
by ssn event_date;
*does member have multiple accession events?;
if event_code eq 101 then org1obs = obs;
if event_code eq 102 then org2obs = obs;
if event_code eq 103 then org3obs = obs;
if event_code eq 104 then org4obs = obs;
if event_code eq 105 then org5bs = obs;
end;
do obs=1 by 1 until (last.ssn);
set personnel;
by ssn event_date;
*now create a single flag variable 'ACCESSION' to indicate any accession event;
if obs = org1obs or obs = org2obs or obs = org3obs or obs = org4obs or obs = org5obs then do;
accession = 1;
end;
else accession = '';
output;
end;
run;
I think that the problem is that the loop counter (OBS) is only identifying 1 record per possible EVENT_CODE value per person.
Is there a way to have the loop counter (OBS) identify every occurrence of EVENT_CODE that meets the relevant criteria?
Thanks
Please show what the desired output for the given input should be. I am not sure I am interpretting your statements as you intend.
Sure. Here's the output I currently get when I run the syntax.
"Accession_Loop" table created (with new variable ACCESSION)
SSN EVENT_DATE EVENT_CODE ACCESSION
123 06/13/2006 105 1
123 10/06/2013 101 .
123 01/10/2014 101 1
456 01/29/2005 102 1
456 02/02/2006 105 1
Here's the output I want to get.
SSN EVENT_DATE EVENT_CODE ACCESSION
123 06/13/2006 105 1
123 10/06/2013 101 1*****
123 01/10/2014 101 1
456 01/29/2005 102 1
456 02/02/2006 105 1
Right now, the loop counter only remembers the record number for 1 EVENT_CODE = 101 for person 123, so in my current output, the accession event on 10/06/2013 is coded as missing. I want every record with EVENT_CODE = 101 for person 123 to be remembered so that every event can be coded '1' in ACCESSION.
Thanks!
Your objective isn't 100% clear, since we have no examples of codes that are not 101 through 105. Why would this program produce the wrong result?
data want;
set have;
if event_code in (101, 102, 103, 104, 105) then accession=1;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.