BookmarkSubscribeRSS Feed
SR2019
Calcite | Level 5

Hello All,

 

I have raw data as shown below. I do not know of how many times rescreening will be done. so, first set has 2 entries and second set has 1 entry and third set has 4 entries.

 

subj       prevsubj
/*** First set - 2 times ****/
1001
1002     1001
1003     1002

/***Second set - 1 time ****/
1011
1012     1011

/***Third set - 4 times ****/
2010
2011     2010
2012     2011
2013     2012
2014     2013

 

The final dataset should look like:

 

subj      prevsubj      newsubj
1001                        1003        
1002    1001            1003
1003    1002            1003

1011                        1012
1012    1011            1012

2010                        2014
2011     2010           2014
2012     2011           2014
2013     2012           2014
2014     2013           2014

 

Can you please let me know of how can this be achieved?

 

Thank You.

 

 

5 REPLIES 5
AMSAS
SAS Super FREQ
  1. What logic is to be used to define newsubj
  2. What code have you tried? Always better to try then ask for help 

The first part, combining the datasets, is easy using a SET statement 

SR2019
Calcite | Level 5

1. Logic is to get the newsubj (last screened subject).

In the example data given:

first set, 1001->1002->1003. And so newsubj should be 1003.

second set, 1011->1012. And so newsubj should be 1012.

third set, 2010->2011->2012->2013->2014. And so newsubj should be 2014.

 

2. I tried merging datasets by splitting subj and prevsubj but this only works for one level.

        

I am not trying to combine datasets, trying to see if there is a way to loop around each subject and see if we can get the expected solution.

PeterClemmensen
Tourmaline | Level 20

Do you compute prevsubj yourself? And why does the first data set not have 3 obs (1001, 1002 and 1003) ?

SR2019
Calcite | Level 5

No, prevsubj is entered in the way i have shown in the post. I created this test data for testing.

first set has 3 records....Not sure i understand your question. Can you please clarify?

Tom
Super User Tom
Super User

Can you explain more what is the process that is generating this data?

 

Looks like the pattern is that when you have a missing value of PREVSUBJ you want to find the largest value of SUBJ in that block.  Might be easiest to first generate the group numbers.

data have;
  input subj prevsubj;
cards;
1001    .
1002 1001
1003 1002
1011    .
1012 1011
2010    .
2011 2010
2012 2011
2013 2012
2014 2013
;

data step1;
  set have ;
  group + missing(prevsubj);
run;

proc sql;
create table want as 
  select group, subj, prevsubj, max(subj) as newsubj
  from step1
  group by group
  order by group, subj
;
quit;

Result

Obs    group    subj    prevsubj    newsubj

  1      1      1001         .        1003
  2      1      1002      1001        1003
  3      1      1003      1002        1003
  4      2      1011         .        1012
  5      2      1012      1011        1012
  6      3      2010         .        2014
  7      3      2011      2010        2014
  8      3      2012      2011        2014
  9      3      2013      2012        2014
 10      3      2014      2013        2014

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 536 views
  • 0 likes
  • 4 in conversation