I have 315 dates per observation where 0 to 50 of them per observation are non-missing, but where those non-missing dates are changes for each observation. Each date has a corresponding value. Below is a simpler example of the data structure with 10 dates and 10 corresponding values.
date1 | date2 | date3 | date4 | date5 | date6 | date7 | date8 | date9 | date10 | value1 | value2 | value3 | value4 | value5 | value6 | value7 | value8 | value9 | value10 | |
obs1 | 5/24/2019 | 11/18/2013 | 7/13/2012 | 10 | 1 | 8 | ||||||||||||||
obs2 | 8/30/2015 | 3/16/2019 | 1 | 2 | ||||||||||||||||
obs3 | 12/14/2011 | 1 | ||||||||||||||||||
obs4 | 3/8/2023 | 11/4/2019 | 7/3/2017 | 8 | 2 | 5 | ||||||||||||||
obs5 | 3/27/2021 | 9/13/2011 | 11/19/2021 | 2/4/2018 | 9 | 4 | 2 | 5 |
I am trying essentially remove all blanks per row and shift the dates and values to be "left justified" for lack of a better term. In other words, I want the resulting dataset to be
newdate1 | newdate2 | newdate3 | newdate4 | newvalue1 | newvalue2 | newvalue3 | newvalue4 | |
obs1 | 5/24/2019 | 11/18/2013 | 7/13/2012 | 10 | 1 | 8 | ||
obs2 | 8/30/2015 | 3/16/2019 | 1 | 2 | ||||
obs3 | 12/14/2011 | 1 | ||||||
obs4 | 3/8/2023 | 11/4/2019 | 7/3/2017 | 8 | 2 | 5 | ||
obs5 | 3/27/2021 | 9/13/2011 | 11/19/2021 | 2/4/2018 | 9 | 4 | 2 | 5 |
To do this, I have the following code.
DATA DATA2; SET DATA; NVALUES=N(OF DATE1-DATE315); ARRAY DATENEW{50} DATENEW1-DATENEW50; ARRAY DATE{315} DATE1-DATE315; ARRAY VALUENEW{50} VALUENEW1-VALUENEW50; ARRAY VALUE{315} VALUE1-VALUE315; DO I=1 TO 315; IF DATE{I} NE . THEN DO J=1 TO NVALUES; DATENEW{J}=DATE{I}; VALUENEW{J}=VALUE{I}; END; END; FORMAT DATENEW1-DATENEW50 MMDDYY10.; RUN;
However, the date and output I am getting is only the last nonzero date and value, as below:
newdate1 | newdate2 | newdate3 | newdate4 | value 1 | value 2 | value 3 | value 4 | |
obs1 | 7/13/2012 | 7/13/2012 | 7/13/2012 | 8 | 8 | 8 | ||
obs2 | 3/16/2019 | 3/16/2019 | 2 | 2 | ||||
obs3 | 12/14/2011 | 1 | ||||||
obs4 | 7/3/2017 | 7/3/2017 | 7/3/2017 | 5 | 5 | 5 | ||
obs5 | 2/4/2018 | 2/4/2018 | 2/4/2018 | 2/4/2018 | 5 | 5 | 5 | 5 |
Any help to correct this would be much appreciated.
Or within a data step
data long;
set data;
ARRAY _DATE{315} DATE1-DATE315;
ARRAY _VALUE{50} VALUE1-VALUE315;
do i=1 to dim(_date);
if not missing(_date(i)) then do;
Record = _n_;
Date = _date(i);
Value = _value(i);
output;
end;
end;
run;
proc sort data=long;
by record date value;
run;
proc transpose data=long out=date prefix=date;
by record;
var date;
run;
proc transpose data=long out=values prefix=values;
by record;
var Value;
run;
data want;
merge date values;
by record;
keep date1-date50 values1-value50 record;
run;
run;
@JonKetchup wrote:
I have 315 dates per observation where 0 to 50 of them per observation are non-missing, but where those non-missing dates are changes for each observation. Each date has a corresponding value. Below is a simpler example of the data structure with 10 dates and 10 corresponding values.
date1 date2 date3 date4 date5 date6 date7 date8 date9 date10 value1 value2 value3 value4 value5 value6 value7 value8 value9 value10 obs1 5/24/2019 11/18/2013 7/13/2012 10 1 8 obs2 8/30/2015 3/16/2019 1 2 obs3 12/14/2011 1 obs4 3/8/2023 11/4/2019 7/3/2017 8 2 5 obs5 3/27/2021 9/13/2011 11/19/2021 2/4/2018 9 4 2 5
I am trying essentially remove all blanks per row and shift the dates and values to be "left justified" for lack of a better term. In other words, I want the resulting dataset to be
newdate1 newdate2 newdate3 newdate4 newvalue1 newvalue2 newvalue3 newvalue4 obs1 5/24/2019 11/18/2013 7/13/2012 10 1 8 obs2 8/30/2015 3/16/2019 1 2 obs3 12/14/2011 1 obs4 3/8/2023 11/4/2019 7/3/2017 8 2 5 obs5 3/27/2021 9/13/2011 11/19/2021 2/4/2018 9 4 2 5
To do this, I have the following code.
DATA DATA2; SET DATA; NVALUES=N(OF DATE1-DATE315); ARRAY DATENEW{50} DATENEW1-DATENEW50; ARRAY DATE{315} DATE1-DATE315; ARRAY VALUENEW{50} VALUENEW1-VALUENEW50; ARRAY VALUE{315} VALUE1-VALUE315; DO I=1 TO 315; IF DATE{I} NE . THEN DO J=1 TO NVALUES; DATENEW{J}=DATE{I}; VALUENEW{J}=VALUE{I}; END; END; FORMAT DATENEW1-DATENEW50 MMDDYY10.; RUN;
However, the date and output I am getting is only the last nonzero date and value, as below:
newdate1 newdate2 newdate3 newdate4 value 1 value 2 value 3 value 4 obs1 7/13/2012 7/13/2012 7/13/2012 8 8 8 obs2 3/16/2019 3/16/2019 2 2 obs3 12/14/2011 1 obs4 7/3/2017 7/3/2017 7/3/2017 5 5 5 obs5 2/4/2018 2/4/2018 2/4/2018 2/4/2018 5 5 5 5
Any help to correct this would be much appreciated.
Or within a data step
data long;
set data;
ARRAY _DATE{315} DATE1-DATE315;
ARRAY _VALUE{50} VALUE1-VALUE315;
do i=1 to dim(_date);
if not missing(_date(i)) then do;
Record = _n_;
Date = _date(i);
Value = _value(i);
output;
end;
end;
run;
proc sort data=long;
by record date value;
run;
proc transpose data=long out=date prefix=date;
by record;
var date;
run;
proc transpose data=long out=values prefix=values;
by record;
var Value;
run;
data want;
merge date values;
by record;
keep date1-date50 values1-value50 record;
run;
run;
@JonKetchup wrote:
I have 315 dates per observation where 0 to 50 of them per observation are non-missing, but where those non-missing dates are changes for each observation. Each date has a corresponding value. Below is a simpler example of the data structure with 10 dates and 10 corresponding values.
date1 date2 date3 date4 date5 date6 date7 date8 date9 date10 value1 value2 value3 value4 value5 value6 value7 value8 value9 value10 obs1 5/24/2019 11/18/2013 7/13/2012 10 1 8 obs2 8/30/2015 3/16/2019 1 2 obs3 12/14/2011 1 obs4 3/8/2023 11/4/2019 7/3/2017 8 2 5 obs5 3/27/2021 9/13/2011 11/19/2021 2/4/2018 9 4 2 5
I am trying essentially remove all blanks per row and shift the dates and values to be "left justified" for lack of a better term. In other words, I want the resulting dataset to be
newdate1 newdate2 newdate3 newdate4 newvalue1 newvalue2 newvalue3 newvalue4 obs1 5/24/2019 11/18/2013 7/13/2012 10 1 8 obs2 8/30/2015 3/16/2019 1 2 obs3 12/14/2011 1 obs4 3/8/2023 11/4/2019 7/3/2017 8 2 5 obs5 3/27/2021 9/13/2011 11/19/2021 2/4/2018 9 4 2 5
To do this, I have the following code.
DATA DATA2; SET DATA; NVALUES=N(OF DATE1-DATE315); ARRAY DATENEW{50} DATENEW1-DATENEW50; ARRAY DATE{315} DATE1-DATE315; ARRAY VALUENEW{50} VALUENEW1-VALUENEW50; ARRAY VALUE{315} VALUE1-VALUE315; DO I=1 TO 315; IF DATE{I} NE . THEN DO J=1 TO NVALUES; DATENEW{J}=DATE{I}; VALUENEW{J}=VALUE{I}; END; END; FORMAT DATENEW1-DATENEW50 MMDDYY10.; RUN;
However, the date and output I am getting is only the last nonzero date and value, as below:
newdate1 newdate2 newdate3 newdate4 value 1 value 2 value 3 value 4 obs1 7/13/2012 7/13/2012 7/13/2012 8 8 8 obs2 3/16/2019 3/16/2019 2 2 obs3 12/14/2011 1 obs4 7/3/2017 7/3/2017 7/3/2017 5 5 5 obs5 2/4/2018 2/4/2018 2/4/2018 2/4/2018 5 5 5 5
Any help to correct this would be much appreciated.
Thanks for the help. The keep statement is not working, but that is not a big deal. However, this seems to be removing all observations that have no dates/values, and I need to preserve those. There are also other variables in the datasets I need to be preserved, like ID. How would I make those changes?
@JonKetchup wrote:
Thanks for the help. The keep statement is not working, but that is not a big deal. However, this seems to be removing all observations that have no dates/values, and I need to preserve those.
Add a condition that if N values is 0 then just output at least one.
@JonKetchup wrote:
There are also other variables in the datasets I need to be preserved, like ID. How would I make those changes?
I used record as the identifier which I made. You can add variables to your KEEP statement in the transpose step and then add them to the BY statement if they help to uniquely identify a row.
data have;
input ID Sex $1. (date1-date4) (:yymmdd.) value1-value4 ;
format date1-date4 yymmdd10.;
cards;
1 F 2019-05-24 2019-05-24 2012-07-13 . 10 10 8 .
2 F 2015-08-30 2019-03-16 . . 1 2 . .
3 M . 2011-12-14 . . . 1 . .
4 F 2023-03-08 . 2017-07-03 . 8 2 5 .
5 M 2021-03-27 2011-09-13 . 2018-02-04 9 4 . 5
. . . . . . . .
;
data long;
set have;
ARRAY _DATE{4} DATE1-DATE4;
ARRAY _VALUE{4} VALUE1-VALUE4;
do i=1 to dim(_date);
if not missing(_date(i)) then do;
Date = _date(i);
Value = _value(i);
output;
end;
end;
keep id sex date value;
run;
proc sort data=long;
by ID sex date value;
run;
proc transpose data=long out=date prefix=date;
by ID SEX;
var date;
run;
proc transpose data=long out=values prefix=values;
by ID SEX;
var Value;
run;
data want;
merge date values;
by ID SEX;
run;
%let &nrows = ..; *number of rows in you dataset;
%macro new_rows; *makes new rows;
%do i = 1 %to &nrows;
1) little row data set;
data part1;
set have;
keep id date:;
if _N_=&i;
run;
2) delete empty cells;
ods select nlevels;
proc freq data=work.my_data nlevels;
ods output nlevels=work.nlevels_my_data;
run;
proc sql;
select TableVar into :empty_columns separated by ","
from part1
where nnonmisslevels = 0;
quit;
proc sql;
alter table part1
drop &emptycolumns;
quit;
3) rename variables;
data part1;
set part1;
https://communities.sas.com/t5/SAS-Programming/Rename-all-variables/td-p/500828?nobounce;
4) saving one row to be set;
data row&i;
set part1;
run;
%mend t;
5) set all rows;
data whole;
%macro setter;
%do i=1 %to &nrows;
row&i
%end;
run;
%mend setter;
Transpose to a long layout, and keep only non-missing values. Extract the number from _name_ and use it to join the dates and values into a single dataset.
Just transpose it and then transpose it back.
First let's make up some data. For example lets us your example result and introduce some "gaps". Why not also introduce some inconsistency where sometimes there is a value but no date?
data have;
input (date1-date4) (:yymmdd.) value1-value4 ;
format date1-date4 yymmdd10.;
cards;
2019-05-24 2013-11-18 2012-07-13 . 10 1 8 .
2015-08-30 2019-03-16 . . 1 2 . .
. 2011-12-14 . . . 1 . .
2023-03-08 . 2017-07-03 . 8 2 5 .
2021-03-27 2011-09-13 . 2018-02-04 9 4 . 5
;
Now let's transpose. Probably easier to use a data step to do the first transpose since you have multiple variables. Since the index into the array has no meaning (we are going to throw the original position away) lets use the older DO OVER syntax to make typing the code easier. Let's make sure to introduce a unique row identifier if one does not exist.
data tall;
row+1;
set have;
array _d date1-date4;
array _v value1-value4;
do over _d;
date=_d;
value=_v;
if n(date,value) then output;
end;
format date yymmdd10.;
drop date1-date4 value1-value4;
run;
Now to go from TALL back to wide we can use PROC SUMMARY. But we will need to enter an upperbound for the maximum number of date/value pair variables we want to create. We can hard code it to the original number, or calculate it from the TALL structure.
proc summary data=tall nway;
by row ;
output out=want(drop=_type_) idgroup (out[4] (date value)=);
run;
Original dateset.
Transposed TALL dataset.
Resulting new dataset
If you are worried about some rows having no values then add these lines to the end of the TALL data step.
call missing(date,value);
output;
If there are other variables you want carried into the output dataset add them to the BY statement in the PROC SUMMARY step. For example if you had ID and COMPANY variables your BY statement would look like.
by row id company ;
data have;
input (date1-date4) (:yymmdd.) value1-value4 ;
format date1-date4 yymmdd10.;
cards;
2019-05-24 2013-11-18 2012-07-13 . 10 1 8 .
2015-08-30 2019-03-16 . . 1 2 . .
. 2011-12-14 . . . 1 . .
2023-03-08 . 2017-07-03 . 8 2 5 .
2021-03-27 2011-09-13 . 2018-02-04 9 4 . 5
;
data want;
set have;
array newdate{4};
array newvalue{4};
array d{*} date1-date4;
array v{*} value1-value4;
j=0;
do i=1 to dim(d);
if not missing(d{i}) then do;j+1;newdate{j}=d{i};end;
end;
j=0;
do i=1 to dim(d);
if not missing(v{i}) then do;j+1;newvalue{j}=v{i};end;
end;
drop date1-date4 value1-value4 i j;
format newdate: yymmdd10.;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.