With the following dataset, how do you code a macro that will create variables per ID that reports the values for each variable (bp, weight, bc) that corresponds to the last form_number? And keep the original variable in the dataset? The number of forms per ID may vary.
data test,
input ID$ bp weight bc form_number;
datalines;
1 120 78 178 1
1 134 80 177 2
1 143 . 176 3
2 111 58 . 1
3 154 55 160 1
3 178 56 144 2
;
run;
what we need is the following new dataset:
ID | bp | weight | bc | form_number | late_bp | late_weight | late_bc |
1 | 120 | 78 | 178 | 1 | 143 | 80 | 176 |
1 | 134 | 80 | 177 | 2 | . | . | . |
1 | 143 | . | 176 | 3 | . | . | . |
2 | 111 | 58 | . | 1 | 111 | 58 | . |
3 | 154 | 55 | 160 | 1 | 178 | 56 | 144 |
3 | 178 | 56 | 144 | 2 | . | . | . |
This is how I would do it, less reading of datasets.
data test;
input ID $ bp weight bc form_number;
datalines;
1 120 78 178 1
1 134 80 177 2
1 143 . 176 3
2 111 58 . 1
3 154 55 160 1
3 178 56 144 2
;
run;
%macro locf(dsin=, dsout=);
*get the latest value;
data latest_values;
update &dsin(obs=0) &dsin;
by id;
run;
*build a rename list for variable names;
proc sql noprint;
select catx("=", name, catt('LAST_', upper(name)))
into :rename_list
separated by " "
from sashelp.vcolumn
where libname='WORK'
and memname='LATEST_VALUES'
and upper(trim(name)) ne 'ID';
quit;
*merge in final results;
data &dsout;
merge &dsin latest_values (rename = (&rename_list));
by id;
run;
*remove latest values dataset for clean process;
proc sql;
drop table latest_values;
quit;
%mend;
%locf(dsin=test, dsout=want);
This is doable, but it's an unusual structure for data. Is there a particular reason you want this format? What's the next step? If you want to calculate, for example, difference between form 1 value and the last value, you could do that without creating these new variables.
I think what you are looking for is a macro that gets the last value for each variable per ID. So something like the below:
%macro get_last(var=);
proc sort data=test; by id; run;
data last_&var.;
by id;
set test;
if &var. ne .;
if last.id;
form_number=1;
last_&var.=&var.;
keep id form_number last_&var.;
run;
%mend get_last;
%get_last(var=bp)
%get_last(var=weight)
%get_last(var=bc)
Afterwards you can merge these datasets back into the original TEST by ID FORM_NUMBER. Note: I'm not sure why there is more than 1 FORM_NUMBER. There can only a single last value. Are FORM_NUMBER 2,3,etc meant to be blank in the final dataset?
@SanKH1 wrote:
With the following dataset, how do you code a macro that will create variables per ID that reports the values for each variable (bp, weight, bc) that corresponds to the last form_number? And keep the original variable in the dataset? The number of forms per ID may vary.
data test,
input ID$ bp weight bc form_number;
datalines;
1 120 78 178 1
1 134 80 177 2
1 143 . 176 3
2 111 58 . 1
3 154 55 160 1
3 178 56 144 2
;
run;what we need is the following new dataset:
ID bp weight bc form_number late_bp late_weight late_bc 1 120 78 178 1 143 80 176 1 134 80 177 2 . . . 1 143 . 176 3 . . . 2 111 58 . 1 111 58 . 3 154 55 160 1 178 56 144 3 178 56 144 2 . . .
You example does not report the values of the last form number as stated. You are showing the value of which ever largest form that has a non-missing value, if any. The 80 for Late_weight for the Id=1 is not from the largest form number 3, but from 2.
So please clarify the statement of your need OR make the example match the stated objective.
Do you really want macro's here? Why?
Arrays are a better idea, and you could automate the naming of the variables if that's the issue.
data last;
set test;
by ID;
array vars(4) bp weight bc form_number;
array last_vars(4) last_bp last_weight last_bc last_form_number;
retain last_bp last_weight last_bc last_form_number;
if first.id then call missing(of last_vars(*));
do i=1 to dim(vars);
if not missing(vars(i)) then last_vars(i) = vars(i);
end;
if last.id then output;
keep id last_:;
run;
data want;
merge test last;
by id;
if not first.id then call missing(of last:);
run;
Thanks! The reason macros are preferred it's because there are many datasets that we want to run.
Are there other variables in the data set that you would not get the last value for besides ID?
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.