BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SeaMoon_168
Obsidian | Level 7

I have a dataset that have four variables: ID, Day, Status, and Var1.

ID Day Status Var 1
1 1 0 1
2 1284 0 2
3 28 0 2
4 432 1 1
5 1018 0 2
6 85 0 1
7 1007 0 2
8 824 0 2
9 907 0 2
10 191 1 2
10 392 1 2
10 433 1 2
11 819 1 2
11 1004 1 2

 

However, since ID 10 and 11 have multiple events and I want to code "Day" as two variables "Start" and "End". Many thanks!

ID Start End Status Var1
1 0 1 0 1
2 0 1284 0 2
3 0 28 0 2
4 0 432 1 1
5 0 1018 0 2
6 0 85 0 1
7 0 1007 0 2
8 0 824 0 2
9 0 907 0 2
10 0 191 1 2
10 191 392 1 2
10 392 433 1 2
11 0 819 1 2
11 819 1004 1 2
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Is your data actually sorted as shown?

 

Please see:

data have;
  input ID 	Day 	Status 	Var ;
datalines;
1 	1 	0 	1
2 	1284 	0 	2
3 	28 	0 	2
4 	432 	1 	1
5 	1018 	0 	2
6 	85 	0 	1
7 	1007 	0 	2
8 	824 	0 	2
9 	907 	0 	2
10 	191 	1 	2
10 	392 	1 	2
10 	433 	1 	2
11 	819 	1 	2
11 	1004 	1 	2
;


data want;
   set have;
   by id;
   l_day=lag(day);
   if first.id then do;
      start=0;
      end=day;
   end;
   else  do;
      start=l_day;
      end=day;
   end;
   drop day l_day;
run;

Note the first data step creates a usable data set as shown. This is the preferred manner of sharing data on this forum as then we know the variable names, types and any properties set such as formats or labels.

 

The second data set assumes the values are sorted by Id and Day. This allows use of a simple By Id. When you  use a By  statement in the data step then SAS creates automatic variables named First.variable and Last.Variable that have values of 1/0 indicating true/false that the current observation is the first of a by group or last of the by group.

Since you need a value from the previous observation we use the LAG function to get the value to use as needed. Warning: this function is a queuing function and seldom returns the result wanted when used conditionally (in an IF <condition> then do; <statements>; end; block ).

Then knowing whether the observation is the first or not we know which value to use for Start and End. Then drop the no longer needed variables.

 

View solution in original post

4 REPLIES 4
ballardw
Super User

Is your data actually sorted as shown?

 

Please see:

data have;
  input ID 	Day 	Status 	Var ;
datalines;
1 	1 	0 	1
2 	1284 	0 	2
3 	28 	0 	2
4 	432 	1 	1
5 	1018 	0 	2
6 	85 	0 	1
7 	1007 	0 	2
8 	824 	0 	2
9 	907 	0 	2
10 	191 	1 	2
10 	392 	1 	2
10 	433 	1 	2
11 	819 	1 	2
11 	1004 	1 	2
;


data want;
   set have;
   by id;
   l_day=lag(day);
   if first.id then do;
      start=0;
      end=day;
   end;
   else  do;
      start=l_day;
      end=day;
   end;
   drop day l_day;
run;

Note the first data step creates a usable data set as shown. This is the preferred manner of sharing data on this forum as then we know the variable names, types and any properties set such as formats or labels.

 

The second data set assumes the values are sorted by Id and Day. This allows use of a simple By Id. When you  use a By  statement in the data step then SAS creates automatic variables named First.variable and Last.Variable that have values of 1/0 indicating true/false that the current observation is the first of a by group or last of the by group.

Since you need a value from the previous observation we use the LAG function to get the value to use as needed. Warning: this function is a queuing function and seldom returns the result wanted when used conditionally (in an IF <condition> then do; <statements>; end; block ).

Then knowing whether the observation is the first or not we know which value to use for Start and End. Then drop the no longer needed variables.

 

SeaMoon_168
Obsidian | Level 7

Many thanks for your help. It is easily understood and works well.

mkeintz
PROC Star

Assumy data are sorted by ID/DAY, then:

 

data have;
  input ID 	Day 	Status 	Var ;
datalines;
1 	1 	0 	1
2 	1284 	0 	2
3 	28 	0 	2
4 	432 	1 	1
5 	1018 	0 	2
6 	85 	0 	1
7 	1007 	0 	2
8 	824 	0 	2
9 	907 	0 	2
10 	191 	1 	2
10 	392 	1 	2
10 	433 	1 	2
11 	819 	1 	2
11 	1004 	1 	2
;

data want;
  set have;
  by id;
  end=day;
  start=ifn(first.id,0,lag(end));
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
SeaMoon_168
Obsidian | Level 7

Thank you for your help.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 207 views
  • 4 likes
  • 3 in conversation