BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JonKetchup
Obsidian | Level 7

I have 315 dates per observation where 0 to 50 of them per observation are non-missing, but where those non-missing dates are changes for each observation. Each date has a corresponding value. Below is a simpler example of the data structure with 10 dates and 10 corresponding values.

 date1date2date3date4date5date6date7date8date9date10value1value2value3value4value5value6value7value8value9value10
obs1 5/24/201911/18/20137/13/2012       1018      
obs2    8/30/20153/16/2019        12    
obs3        12/14/2011         1 
obs4     3/8/202311/4/20197/3/2017       825  
obs53/27/20219/13/201111/19/20212/4/2018      9425      

 

I am trying essentially remove all blanks per row and shift the dates and values to be "left justified" for lack of a better term. In other words, I want the resulting dataset to be

 newdate1newdate2newdate3newdate4newvalue1newvalue2newvalue3newvalue4
obs15/24/201911/18/20137/13/2012 1018 
obs28/30/20153/16/2019  12  
obs312/14/2011   1   
obs43/8/202311/4/20197/3/2017 825 
obs53/27/20219/13/201111/19/20212/4/20189425

 

To do this, I have the following code. 

DATA DATA2; SET DATA;
NVALUES=N(OF DATE1-DATE315);
ARRAY DATENEW{50} DATENEW1-DATENEW50;
ARRAY DATE{315} DATE1-DATE315;
ARRAY VALUENEW{50} VALUENEW1-VALUENEW50;
ARRAY VALUE{315} VALUE1-VALUE315;
DO I=1 TO 315;
	IF DATE{I} NE . THEN DO J=1 TO NVALUES;
		DATENEW{J}=DATE{I}; VALUENEW{J}=VALUE{I};
	END;
END;
FORMAT DATENEW1-DATENEW50 MMDDYY10.;
RUN;

 

However, the date and output I am getting is only the last nonzero date and value, as below:

 newdate1newdate2newdate3newdate4value 1value 2value 3value 4
obs17/13/20127/13/20127/13/2012 888 
obs23/16/20193/16/2019  22  
obs312/14/2011   1   
obs47/3/20177/3/20177/3/2017 555 
obs52/4/20182/4/20182/4/20182/4/20185555

 

 

Any help to correct this would be much appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
  • Transpose to a long format
  • Delete missing
  • Sort
  • Transpose back

Or within a data step

  • Find number of non-missing entries -> N()
  • Loop through using LARGEST/SMALLEST to allocate to new locations
  • Not sure how this would handle ties 
data long;
set data;
ARRAY _DATE{315} DATE1-DATE315;
ARRAY _VALUE{50} VALUE1-VALUE315;

do i=1 to dim(_date);
if not missing(_date(i)) then do;
Record = _n_;
Date = _date(i);
Value = _value(i);
output;
end;
end;
run;

proc sort data=long;
by record date value;
run;

proc transpose data=long out=date prefix=date;
by record;
var date;
run;

proc transpose data=long out=values prefix=values;
by record;
var Value;
run;

data want;
merge date values;
by record;
keep date1-date50 values1-value50 record;
run;
run;

@JonKetchup wrote:

I have 315 dates per observation where 0 to 50 of them per observation are non-missing, but where those non-missing dates are changes for each observation. Each date has a corresponding value. Below is a simpler example of the data structure with 10 dates and 10 corresponding values.

  date1 date2 date3 date4 date5 date6 date7 date8 date9 date10 value1 value2 value3 value4 value5 value6 value7 value8 value9 value10
obs1   5/24/2019 11/18/2013 7/13/2012               10 1 8            
obs2         8/30/2015 3/16/2019                 1 2        
obs3                 12/14/2011                   1  
obs4           3/8/2023 11/4/2019 7/3/2017               8 2 5    
obs5 3/27/2021 9/13/2011 11/19/2021 2/4/2018             9 4 2 5            

 

I am trying essentially remove all blanks per row and shift the dates and values to be "left justified" for lack of a better term. In other words, I want the resulting dataset to be

  newdate1 newdate2 newdate3 newdate4 newvalue1 newvalue2 newvalue3 newvalue4
obs1 5/24/2019 11/18/2013 7/13/2012   10 1 8  
obs2 8/30/2015 3/16/2019     1 2    
obs3 12/14/2011       1      
obs4 3/8/2023 11/4/2019 7/3/2017   8 2 5  
obs5 3/27/2021 9/13/2011 11/19/2021 2/4/2018 9 4 2 5

 

To do this, I have the following code. 

DATA DATA2; SET DATA;
NVALUES=N(OF DATE1-DATE315);
ARRAY DATENEW{50} DATENEW1-DATENEW50;
ARRAY DATE{315} DATE1-DATE315;
ARRAY VALUENEW{50} VALUENEW1-VALUENEW50;
ARRAY VALUE{315} VALUE1-VALUE315;
DO I=1 TO 315;
	IF DATE{I} NE . THEN DO J=1 TO NVALUES;
		DATENEW{J}=DATE{I}; VALUENEW{J}=VALUE{I};
	END;
END;
FORMAT DATENEW1-DATENEW50 MMDDYY10.;
RUN;

 

However, the date and output I am getting is only the last nonzero date and value, as below:

  newdate1 newdate2 newdate3 newdate4 value 1 value 2 value 3 value 4
obs1 7/13/2012 7/13/2012 7/13/2012   8 8 8  
obs2 3/16/2019 3/16/2019     2 2    
obs3 12/14/2011       1      
obs4 7/3/2017 7/3/2017 7/3/2017   5 5 5  
obs5 2/4/2018 2/4/2018 2/4/2018 2/4/2018 5 5 5 5

 

 

Any help to correct this would be much appreciated.


 

View solution in original post

10 REPLIES 10
Reeza
Super User
  • Transpose to a long format
  • Delete missing
  • Sort
  • Transpose back

Or within a data step

  • Find number of non-missing entries -> N()
  • Loop through using LARGEST/SMALLEST to allocate to new locations
  • Not sure how this would handle ties 
data long;
set data;
ARRAY _DATE{315} DATE1-DATE315;
ARRAY _VALUE{50} VALUE1-VALUE315;

do i=1 to dim(_date);
if not missing(_date(i)) then do;
Record = _n_;
Date = _date(i);
Value = _value(i);
output;
end;
end;
run;

proc sort data=long;
by record date value;
run;

proc transpose data=long out=date prefix=date;
by record;
var date;
run;

proc transpose data=long out=values prefix=values;
by record;
var Value;
run;

data want;
merge date values;
by record;
keep date1-date50 values1-value50 record;
run;
run;

@JonKetchup wrote:

I have 315 dates per observation where 0 to 50 of them per observation are non-missing, but where those non-missing dates are changes for each observation. Each date has a corresponding value. Below is a simpler example of the data structure with 10 dates and 10 corresponding values.

  date1 date2 date3 date4 date5 date6 date7 date8 date9 date10 value1 value2 value3 value4 value5 value6 value7 value8 value9 value10
obs1   5/24/2019 11/18/2013 7/13/2012               10 1 8            
obs2         8/30/2015 3/16/2019                 1 2        
obs3                 12/14/2011                   1  
obs4           3/8/2023 11/4/2019 7/3/2017               8 2 5    
obs5 3/27/2021 9/13/2011 11/19/2021 2/4/2018             9 4 2 5            

 

I am trying essentially remove all blanks per row and shift the dates and values to be "left justified" for lack of a better term. In other words, I want the resulting dataset to be

  newdate1 newdate2 newdate3 newdate4 newvalue1 newvalue2 newvalue3 newvalue4
obs1 5/24/2019 11/18/2013 7/13/2012   10 1 8  
obs2 8/30/2015 3/16/2019     1 2    
obs3 12/14/2011       1      
obs4 3/8/2023 11/4/2019 7/3/2017   8 2 5  
obs5 3/27/2021 9/13/2011 11/19/2021 2/4/2018 9 4 2 5

 

To do this, I have the following code. 

DATA DATA2; SET DATA;
NVALUES=N(OF DATE1-DATE315);
ARRAY DATENEW{50} DATENEW1-DATENEW50;
ARRAY DATE{315} DATE1-DATE315;
ARRAY VALUENEW{50} VALUENEW1-VALUENEW50;
ARRAY VALUE{315} VALUE1-VALUE315;
DO I=1 TO 315;
	IF DATE{I} NE . THEN DO J=1 TO NVALUES;
		DATENEW{J}=DATE{I}; VALUENEW{J}=VALUE{I};
	END;
END;
FORMAT DATENEW1-DATENEW50 MMDDYY10.;
RUN;

 

However, the date and output I am getting is only the last nonzero date and value, as below:

  newdate1 newdate2 newdate3 newdate4 value 1 value 2 value 3 value 4
obs1 7/13/2012 7/13/2012 7/13/2012   8 8 8  
obs2 3/16/2019 3/16/2019     2 2    
obs3 12/14/2011       1      
obs4 7/3/2017 7/3/2017 7/3/2017   5 5 5  
obs5 2/4/2018 2/4/2018 2/4/2018 2/4/2018 5 5 5 5

 

 

Any help to correct this would be much appreciated.


 

JonKetchup
Obsidian | Level 7

Thanks for the help. The keep statement is not working, but that is not a big deal. However, this seems to be removing all observations that have no dates/values, and I need to preserve those. There are also other variables in the datasets I need to be preserved, like ID. How would I make those changes? 

Reeza
Super User

@JonKetchup wrote:

Thanks for the help. The keep statement is not working, but that is not a big deal. However, this seems to be removing all observations that have no dates/values, and I need to preserve those. 


Add a condition that if N values is 0 then just output at least one. 

 


@JonKetchup wrote:

 There are also other variables in the datasets I need to be preserved, like ID. How would I make those changes? 


I used record as the identifier which I made. You can add variables to your KEEP statement in the transpose step and then add them to the BY statement if they help to uniquely identify a row. 

Reeza
Super User
data have; 
  input ID Sex $1. (date1-date4) (:yymmdd.) value1-value4 ;
  format date1-date4 yymmdd10.;
cards;
1 F 2019-05-24 2019-05-24 2012-07-13          . 10 10 8 .
2 F 2015-08-30 2019-03-16          .          . 1 2 . .
3 M         . 2011-12-14          .          . . 1 . .
4 F 2023-03-08          . 2017-07-03          . 8 2 5 .
5 M 2021-03-27 2011-09-13          . 2018-02-04 9 4 . 5
         .          .          .          . . . . .
;


data long;
set have;
ARRAY _DATE{4} DATE1-DATE4;
ARRAY _VALUE{4} VALUE1-VALUE4;

do i=1 to dim(_date);
if not missing(_date(i)) then do;
Date = _date(i);
Value = _value(i);
output;
end;
end;

keep id sex date value;
run;

proc sort data=long;
by ID sex date value;
run;

proc transpose data=long out=date prefix=date;
by ID SEX;
var date;
run;

proc transpose data=long out=values prefix=values;
by ID SEX;
var Value;
run;

data want;
merge date values;
by ID SEX;
run;
pink_poodle
Barite | Level 11
I will write pseudo-code for this. Keep only the ids and the dates. Take each row into a little dataset and delete all the missing date cells/columns. Rename the remaining date variables. Set the rows. Merge to the other part of the initial dataset (ids and the values).
Deleting missing date cells is tricky because SAS does not do conditional variable drops. The info how to do that is here:
https://sasexamplecode.com/how-to-identify-remove-empty-columns-in-sas/
pink_poodle
Barite | Level 11
%let &nrows = ..; *number of rows in you dataset;
%macro new_rows; *makes new rows;
%do i = 1 %to &nrows;

1) little row data set;
data part1;
set have;
keep id date:;
if _N_=&i;
run;

2) delete empty cells;
ods select nlevels;
proc freq data=work.my_data nlevels;
ods output nlevels=work.nlevels_my_data;
run;

proc sql;
	select TableVar into :empty_columns separated by ","
	from part1
	where nnonmisslevels = 0;
quit;

proc sql;
	alter table part1
	drop &emptycolumns;
quit;

3) rename variables;
data part1;
set part1;
https://communities.sas.com/t5/SAS-Programming/Rename-all-variables/td-p/500828?nobounce;

4) saving one row to be set;
data row&i;
set part1;
run;

%mend t;

5) set all rows;
data whole;
%macro setter;
%do i=1 %to &nrows;
row&i
%end;
run;
%mend setter;
Tom
Super User Tom
Super User

Just transpose it and then transpose it back.

 

First let's make up some data. For example lets us your example result and introduce some "gaps".  Why not also introduce some inconsistency where sometimes there is a value but no date?

data have; 
  input (date1-date4) (:yymmdd.) value1-value4 ;
  format date1-date4 yymmdd10.;
cards;
2019-05-24 2013-11-18 2012-07-13          . 10 1 8 .
2015-08-30 2019-03-16          .          . 1 2 . .
         . 2011-12-14          .          . . 1 . .
2023-03-08          . 2017-07-03          . 8 2 5 .
2021-03-27 2011-09-13          . 2018-02-04 9 4 . 5
;

Now let's transpose.  Probably easier to use a data step to do the first transpose since you have multiple variables. Since the index into the array has no meaning (we are going to throw the original position away) lets use the older DO OVER syntax to make typing the code easier.  Let's make sure to introduce a unique row identifier if one does not exist. 

data tall;
  row+1;
  set have;
  array _d date1-date4;
  array _v value1-value4;
  do over _d;
    date=_d;
    value=_v;
    if n(date,value) then output;
  end;
  format date yymmdd10.;
  drop date1-date4 value1-value4;
run;

Now to go from TALL back to wide we can use PROC SUMMARY.  But we will need to enter an upperbound for the maximum number of date/value pair variables we want to create.  We can hard code it to the original number, or calculate it from the TALL structure.

 

 

proc summary data=tall nway;
  by row ;
  output out=want(drop=_type_) idgroup (out[4] (date value)=);
run;

Original dateset.

 

image.png

Transposed TALL dataset.

image.png

Resulting new dataset

image.png

 

If you are worried about some rows having no values then add these lines to the end of the TALL data step.

call missing(date,value);
output;

If there are other variables you want carried into the output dataset add them to the BY statement in the PROC SUMMARY step.  For example if you had ID and COMPANY variables your BY statement would look like.

by row id company ;
Reeza
Super User
And handles ties correctly, definitely a simpler approach.
Ksharp
Super User
data have; 
  input (date1-date4) (:yymmdd.) value1-value4 ;
  format date1-date4 yymmdd10.;
cards;
2019-05-24 2013-11-18 2012-07-13          . 10 1 8 .
2015-08-30 2019-03-16          .          . 1 2 . .
         . 2011-12-14          .          . . 1 . .
2023-03-08          . 2017-07-03          . 8 2 5 .
2021-03-27 2011-09-13          . 2018-02-04 9 4 . 5
;

data want;
 set have;
 array newdate{4};
 array newvalue{4};

 array d{*} date1-date4;
 array v{*} value1-value4;

j=0;
do i=1 to dim(d);
 if not missing(d{i}) then do;j+1;newdate{j}=d{i};end;
end;

j=0;
do i=1 to dim(d);
 if not missing(v{i}) then do;j+1;newvalue{j}=v{i};end;
end;

drop date1-date4 value1-value4 i j;
format newdate: yymmdd10.;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2307 views
  • 8 likes
  • 6 in conversation