BookmarkSubscribeRSS Feed
nstdt
Quartz | Level 8

Hi all,

I have two data sets:

 1) have1  with variables names occurring in series: a1,a2,a3, time1,time2,time3. The "a"s are all empty. I can't go back and make any changes in the code used to create this dataset.

and 

2) Old  which has the values for the "a" variables but in "Long" format

I need to create a new dataset have1 in such a way that I can get the "a" variable from Old . How can I do this?

Here is some sample data:

##  dataset with ID variable and time variables  filled and "a" variables empty  

data have1;
 input ID a1 a2 a3 time1 time2 time3;
 datalines;
 99 . . . 1 2 3
 99 . . . 10 11 12
 999 . . . 1 2 3
 999 . . . 10 11 12
 ;
 run;

## dataset with ID, Time and "a" variable in rows
data Old;
input ID  time a;
datalines;
999 10 3
999 11  .
999 12 4
99  10 5
99   1 .
;
run;
## I want to pull in the "a" variable from Old based on the "time" variable
## This will change the Long dataset Old to the Wide dataset have1 data have1; do i=1 to 3 until (last.ID); set Old; by ID; if time[i]=time then a[i] = a; end; run;

Thank you!

 

 

4 REPLIES 4
Tom
Super User Tom
Super User

If you want to process the data by ID then make sure it is sorted by ID.

Your example HAVE1 dataset has the repeated measures order by TIME also so I would include that in the sort order.

You have to DEFINE an array before you can use.  

It saves a lot of code when defining a array of variable with numeric suffixes if the array name is the same as the basename of the variable being referenced by the array.  So let's rename your input variables to get them out of the way so we can use the basename as the name of the array.

data Old;
  input ID time a;
datalines;
999 10 3
999 11  .
999 12 4
99  10 5
99   1 .
;

proc sort data=old;
  by id time;
run;

data want;
  do _index=1 to 3 until(last.id);
    set old (rename=(time=_time a=_a));
    by id ;
    array a[3];
    array time[3];
    a[_index]=_a;
    time[_index]=_time;
  end;
  drop _: ;
run;

proc print;
run;
Obs     ID    a1    a2    a3    time1    time2    time3

 1      99     .     5     .       1       10        .
 2     999     3     .     4      10       11       12

So your OLD dataset does NOT have the same information as your posted HAVE1 dataset.

Did you expect that they would?  If so then you need provide more rules for how to construct the new dataset.

 

If you want to recreate the HAVE1 data with the A values from OLD inserted then it seems easier to convert HAVE1 to TALL structure and merge.  You can then convert back to the WIDE structure if you want.

data have1;
  input ID a1-a3 time1-time3;
datalines;
 99 . . . 1 2 3
 99 . . . 10 11 12
999 . . . 1 2 3
999 . . . 10 11 12
;

data tall ;
  set have1;
  array time[3];
  do index=1 to dim(time);
    _time = time[index];
    output;
  end;
  keep id _time;
run;

data fixed ;
  merge tall old(rename=(time=_time a=_a));
  by id _time;
run;

data want;
  do _index=1 to 3 until(last.id);
    set fixed ;
    by id ;
    array a[3];
    array time[3];
    a[_index]=_a;
    time[_index]=_time;
  end;
  drop _: ;
run;

Result:

Obs     ID    a1    a2    a3    time1    time2    time3

 1      99     .     .     .       1        2        3
 2      99     5     .     .      10       11       12
 3     999     .     .     .       1        2        3
 4     999     3     .     4      10       11       12
nstdt
Quartz | Level 8

Hi,

Thanks for your reply. It is not necessary for Old to have the same information as Have1 - just that Old has the A values a well as ID but Have1 does not. I can't make both datasets Tall - that's because the specific purpose of this code is to create a Wide dataset to manage further data processing.

 Is it possible to automate using macros the steps where you create arrays in Want- where you declare 'array a' and 'array time'? This is what I tried but couldn't do - I actually have about a 1000 variables, not just A. Let us say I have the variable names, all 1000, stored in a macro-variable. How could I call a macro to create arrays within the data step when you create Data Want?

 

Thanks.

data want;
  do _index=1 to 3 until(last.id);
    set old (rename=(time=_time a=_a));
    by id ;
    array a[3];
    array time[3];
    a[_index]=_a;
    time[_index]=_time;
  end;
  drop _: ;
run;
nstdt
Quartz | Level 8

I am thinking of something like below macro code:

 

options mprint mlogic;
%macro getArrayData(old,numvars);
data have1;
%do _index=1 %to 3 until(last.id);
	set &old;
	by id;
	/** index i starts at 2 because the first variable is the ID **/
	%do i=2 %to 1000;
	/** add suffix _var to each variable and make it an array of length 3 **/
       array %scan(&var_list,&i.)_var {3};     ;
	%end;
	%if time1 = time %then %do;
	/** Loop through 1000 vriable names stored in macrovariable var_list **/
		%do j=2 %to 1000;
		/** add suffix _var to current variable and take the first element **/
			%scan(&var_list,&j.)_var {1}=%scan(&var_list,&j.);
		%end;
	%end;
	%else %if time2 = time %then %do;
		%do k=2 %to 1000;
		/** add suffix _var to current variable and take the second element **/
			%scan(&var_list,&k.)_var {2}=%scan(&var_list,&k.);
		%end;
	%end;
	%else %if time3 = time %then %do;
		%do n=1 %to 1000;
		/** add suffix _var to current variable and take the third element **/
			%scan(&var_list,&n.)_var {3}=%scan(&var_list,&n.);
		%end;

	%end;
%mend getArrayData;

One simplification I can make is that in Dataset Old and Have1, time can only ever be 10,11 or 12.

 

data Old;
  input ID time a;
datalines;
999 10 3
999 11  .
999 12 4
99  10 5
99  11 .
99  12 .
;
run;
data have1;
  input ID a1-a3 time1-time3;
datalines;
 99 . . . 1 2 3
 99 . . . 10 11 12
999 . . . 1 2 3
999 . . . 10 11 12
;
Kurt_Bremser
Super User

Maxim 19: Long Beats Wide.

Transpose your have1 dataset to long, then it turns into a simple join on id and time. From then on, stay with the long dataset layout.

data long;
set have1;
array _a {*} a1-a3;
array _t{*} time1-time3;
do i = 1 to 3;
  time = _t{i};
  a = _a{i};
  output;
end;
keep id time a;
run;

proc sql;
create table want as
  select
    long.id,
    long.time,
    coalesce(long.a,old.a) as a
  from long left join old
  on long.id = old.id and long.time = old.time
;
quit;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 794 views
  • 0 likes
  • 3 in conversation