BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ANKH1
Pyrite | Level 9

With the following dataset, how do you code a macro that will create variables per ID that reports the values for each variable (bp, weight, bc) that corresponds to the last form_number? And keep the original variable in the dataset? The number of forms per ID may vary.

data test,
input ID$ bp weight bc form_number;
datalines;
1 120 78 178 1
1 134 80 177 2
1 143 . 176 3
2 111 58 . 1
3 154 55 160 1
3 178 56 144 2
;
run;

what we need is the following new dataset:

IDbpweightbcform_numberlate_bplate_weightlate_bc
112078178114380176
1134801772...
1143.1763...
211158.111158.
315455160117856144
3178561442...

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

This is how I would do it, less reading of datasets. 

 

data test;
input ID $ bp weight bc form_number;
datalines;
1 120 78 178 1
1 134 80 177 2
1 143 . 176 3
2 111 58 . 1
3 154 55 160 1
3 178 56 144 2
;
run;

%macro locf(dsin=, dsout=);

*get the latest value;
data latest_values;
update &dsin(obs=0) &dsin;
by id;
run;

*build a rename list for variable names;
proc sql noprint;
select catx("=", name, catt('LAST_', upper(name))) 
into :rename_list
separated by " "
from sashelp.vcolumn
where libname='WORK'
and memname='LATEST_VALUES'
and upper(trim(name)) ne 'ID';
quit;

*merge in final results;
data &dsout;
merge &dsin latest_values (rename = (&rename_list));
by id;
run;

*remove latest values dataset for clean process;
proc sql;
drop table latest_values;
quit;

%mend;

%locf(dsin=test, dsout=want);

View solution in original post

34 REPLIES 34
Quentin
Super User

This is doable, but it's an unusual structure for data.  Is there  a particular reason you want this format? What's the next step?  If you want to calculate, for example, difference between form 1 value and the last value, you could do that without creating these new variables.

BASUG is hosting free webinars ! Check out recordings of our past webinars: https://www.basug.org/videos. Save the date for our in person SAS Blowout on Oct 18 in Cambridge, MA. Registration opens in September.
ANKH1
Pyrite | Level 9
Need to create a dataset that reports the last non-missing value per variable per ID. Thanks!
Seadrago
Obsidian | Level 7

I think what you are looking for is a macro that gets the last value for each variable per ID. So something like the below:

 

%macro get_last(var=);

proc sort data=test; by id; run;

 

data last_&var.;

  by id;

  set test;

  if &var. ne .;

  if last.id;

  form_number=1;

  last_&var.=&var.;

  keep id form_number last_&var.;

run;

%mend get_last;

%get_last(var=bp)

%get_last(var=weight)

%get_last(var=bc)

 

Afterwards you can merge these datasets back into the original TEST by ID FORM_NUMBER. Note: I'm not sure why there is more than 1 FORM_NUMBER. There can only a single last value. Are FORM_NUMBER 2,3,etc meant to be blank in the final dataset?

 

ANKH1
Pyrite | Level 9
Sorry, each late_var should report the last non-missing value from the all form_numbers per ID.
ANKH1
Pyrite | Level 9
and for the late_var columns there should be one value. I am not sure if it is ok to fill all rows corresponding to the same ID filled out with the same value? Example: for ID 1 the late_bp=143 will be repeated 3 times since this ID has three rows.
ballardw
Super User

@ANKH1 wrote:

With the following dataset, how do you code a macro that will create variables per ID that reports the values for each variable (bp, weight, bc) that corresponds to the last form_number? And keep the original variable in the dataset? The number of forms per ID may vary.

data test,
input ID$ bp weight bc form_number;
datalines;
1 120 78 178 1
1 134 80 177 2
1 143 . 176 3
2 111 58 . 1
3 154 55 160 1
3 178 56 144 2
;
run;

what we need is the following new dataset:

ID bp weight bc form_number late_bp late_weight late_bc
1 120 78 178 1 143 80 176
1 134 80 177 2 . . .
1 143 . 176 3 . . .
2 111 58 . 1 111 58 .
3 154 55 160 1 178 56 144
3 178 56 144 2 . . .

 


You example does not report the values of the last form number as stated. You are showing the value of which ever largest form that has a non-missing value, if any. The 80 for Late_weight for the Id=1 is not from the largest form number 3, but from 2.

 

So please clarify the statement of your need OR make the example match the stated objective.

ANKH1
Pyrite | Level 9
You are right, it's the last non-missing value per variable. Apologies.
Reeza
Super User

Do you really want macro's here? Why? 

Arrays are a better idea, and you could automate the naming of the variables if that's the issue.

 

data last;
set test;
by ID;

array vars(4) bp weight bc form_number;
array last_vars(4) last_bp last_weight last_bc last_form_number;
retain last_bp last_weight last_bc last_form_number;

if first.id then call missing(of last_vars(*));

do i=1 to dim(vars);
if not missing(vars(i)) then last_vars(i) = vars(i);
end;

if last.id then output;

keep id last_:;
run;

data want;
merge test last;
by id;
if not first.id then call missing(of last:);
run;
ANKH1
Pyrite | Level 9

Thanks! The reason macros are preferred it's because there are many datasets that we want to run.

Reeza
Super User
Well then you definitely need to specify more details. The macro shown here won't really help with many datasets as its designed for multiple variables not multiple data sets. Do the data sets all have the same names? This is kind of a weird structure, would you not want the last on all rows, that's way more common. Or just create a data set with just the last values for other calculations? I suspect there's something missing in the requirements here.
ANKH1
Pyrite | Level 9
not all datasets have the same variables. the only common variables are ID, form_number. The number of datasets are more than 50. I don't understand why this is considered a weird dataset structure. Datasets collect different info from each ID and the number of forms per ID is not constant across datasets either. Since each dataset asks about different topics.
Reeza
Super User
It's an uncommon data structure in that you add the latest to the first row of each ID only.

Whenever I've seen this added it's been added to all rows for that ID or kept in a separate table.
ANKH1
Pyrite | Level 9
Ok, I misunderstood. The adding of the latest to the first row of each ID only is not a requirement. We can add the value to all rows. Thanks.
Reeza
Super User

Are there other variables in the data set that you would not get the last value for besides ID?

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 34 replies
  • 1825 views
  • 12 likes
  • 7 in conversation