Retrieve First and Last Subject with no grouping variable

SR2019 · Posted 10-26-2021 01:21 PM

Hello All,

I have raw data as shown below. I do not know of how many times rescreening will be done. so, first set has 2 entries and second set has 1 entry and third set has 4 entries.

subj prevsubj
/*** First set - 2 times ****/
1001
1002 1001
1003 1002

/***Second set - 1 time ****/
1011
1012 1011

/***Third set - 4 times ****/
2010
2011 2010
2012 2011
2013 2012
2014 2013

The final dataset should look like:

subj prevsubj newsubj
1001 1003
1002 1001 1003
1003 1002 1003

1011 1012
1012 1011 1012

2010 2014
2011 2010 2014
2012 2011 2014
2013 2012 2014
2014 2013 2014

Can you please let me know of how can this be achieved?

Thank You.

AMSAS · Posted 10-26-2021 01:41 PM

What logic is to be used to define newsubj
What code have you tried? Always better to try then ask for help

The first part, combining the datasets, is easy using a SET statement

SR2019 · Posted 10-26-2021 02:20 PM

1. Logic is to get the newsubj (last screened subject).

In the example data given:

first set, 1001->1002->1003. And so newsubj should be 1003.

second set, 1011->1012. And so newsubj should be 1012.

third set, 2010->2011->2012->2013->2014. And so newsubj should be 2014.

2. I tried merging datasets by splitting subj and prevsubj but this only works for one level.

I am not trying to combine datasets, trying to see if there is a way to loop around each subject and see if we can get the expected solution.

PeterClemmensen · Posted 10-26-2021 01:52 PM

Do you compute prevsubj yourself? And why does the first data set not have 3 obs (1001, 1002 and 1003) ?

The DATA to DATA Step Macro
Blog: SASnrd

SR2019 · Posted 10-26-2021 02:23 PM

No, prevsubj is entered in the way i have shown in the post. I created this test data for testing.

first set has 3 records....Not sure i understand your question. Can you please clarify?

Tom · Posted 10-26-2021 04:47 PM

Can you explain more what is the process that is generating this data?

Looks like the pattern is that when you have a missing value of PREVSUBJ you want to find the largest value of SUBJ in that block. Might be easiest to first generate the group numbers.

data have;
  input subj prevsubj;
cards;
1001    .
1002 1001
1003 1002
1011    .
1012 1011
2010    .
2011 2010
2012 2011
2013 2012
2014 2013
;

data step1;
  set have ;
  group + missing(prevsubj);
run;

proc sql;
create table want as 
  select group, subj, prevsubj, max(subj) as newsubj
  from step1
  group by group
  order by group, subj
;
quit;

Result

Obs    group    subj    prevsubj    newsubj

  1      1      1001         .        1003
  2      1      1002      1001        1003
  3      1      1003      1002        1003
  4      2      1011         .        1012
  5      2      1012      1011        1012
  6      3      2010         .        2014
  7      3      2011      2010        2014
  8      3      2012      2011        2014
  9      3      2013      2012        2014
 10      3      2014      2013        2014

Retrieve First and Last Subject with no grouping variable

Re: Retrieve First and Last Subject with no grouping variable

Re: Retrieve First and Last Subject with no grouping variable

Re: Retrieve First and Last Subject with no grouping variable

Re: Retrieve First and Last Subject with no grouping variable

Re: Retrieve First and Last Subject with no grouping variable

Retrieve First and Last Subject with no grouping variable

Re: Retrieve First and Last Subject with no grouping variable

Re: Retrieve First and Last Subject with no grouping variable

Re: Retrieve First and Last Subject with no grouping variable

Re: Retrieve First and Last Subject with no grouping variable

Re: Retrieve First and Last Subject with no grouping variable

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!