Solved: Re: Removing Missing Observations in a Row

JonKetchup · Posted 11-03-2021 04:51 PM

I have 315 dates per observation where 0 to 50 of them per observation are non-missing, but where those non-missing dates are changes for each observation. Each date has a corresponding value. Below is a simpler example of the data structure with 10 dates and 10 corresponding values.

date1

date2

date3

date4

date5

date6

date7

date8

date9

date10

value1

value2

value3

value4

value5

value6

value7

value8

value9

value10

obs1

5/24/2019

11/18/2013

7/13/2012

10

1

8

obs2

8/30/2015

3/16/2019

1

2

obs3

12/14/2011

1

obs4

3/8/2023

11/4/2019

7/3/2017

8

2

5

obs5

3/27/2021

9/13/2011

11/19/2021

2/4/2018

9

4

2

5

I am trying essentially remove all blanks per row and shift the dates and values to be "left justified" for lack of a better term. In other words, I want the resulting dataset to be

	newdate1	newdate2	newdate3	newdate4	newvalue1	newvalue2	newvalue3	newvalue4
obs1	5/24/2019	11/18/2013	7/13/2012		10	1	8
obs2	8/30/2015	3/16/2019			1	2
obs3	12/14/2011				1
obs4	3/8/2023	11/4/2019	7/3/2017		8	2	5
obs5	3/27/2021	9/13/2011	11/19/2021	2/4/2018	9	4	2	5

To do this, I have the following code.

DATA DATA2; SET DATA;
NVALUES=N(OF DATE1-DATE315);
ARRAY DATENEW{50} DATENEW1-DATENEW50;
ARRAY DATE{315} DATE1-DATE315;
ARRAY VALUENEW{50} VALUENEW1-VALUENEW50;
ARRAY VALUE{315} VALUE1-VALUE315;
DO I=1 TO 315;
	IF DATE{I} NE . THEN DO J=1 TO NVALUES;
		DATENEW{J}=DATE{I}; VALUENEW{J}=VALUE{I};
	END;
END;
FORMAT DATENEW1-DATENEW50 MMDDYY10.;
RUN;

However, the date and output I am getting is only the last nonzero date and value, as below:

	newdate1	newdate2	newdate3	newdate4	value 1	value 2	value 3	value 4
obs1	7/13/2012	7/13/2012	7/13/2012		8	8	8
obs2	3/16/2019	3/16/2019			2	2
obs3	12/14/2011				1
obs4	7/3/2017	7/3/2017	7/3/2017		5	5	5
obs5	2/4/2018	2/4/2018	2/4/2018	2/4/2018	5	5	5	5

Any help to correct this would be much appreciated.

Reeza · Posted 11-03-2021 05:07 PM

Transpose to a long format
Delete missing
Sort
Transpose back

Or within a data step

Find number of non-missing entries -> N()
Loop through using LARGEST/SMALLEST to allocate to new locations
Not sure how this would handle ties

data long;
set data;
ARRAY _DATE{315} DATE1-DATE315;
ARRAY _VALUE{50} VALUE1-VALUE315;

do i=1 to dim(_date);
if not missing(_date(i)) then do;
Record = _n_;
Date = _date(i);
Value = _value(i);
output;
end;
end;
run;

proc sort data=long;
by record date value;
run;

proc transpose data=long out=date prefix=date;
by record;
var date;
run;

proc transpose data=long out=values prefix=values;
by record;
var Value;
run;

data want;
merge date values;
by record;
keep date1-date50 values1-value50 record;
run;
run;

@JonKetchup wrote:

I have 315 dates per observation where 0 to 50 of them per observation are non-missing, but where those non-missing dates are changes for each observation. Each date has a corresponding value. Below is a simpler example of the data structure with 10 dates and 10 corresponding values.

date1 date2 date3 date4 date5 date6 date7 date8 date9 date10 value1 value2 value3 value4 value5 value6 value7 value8 value9 value10

obs1 5/24/2019 11/18/2013 7/13/2012 10 1 8

obs2 8/30/2015 3/16/2019 1 2

obs3 12/14/2011 1

obs4 3/8/2023 11/4/2019 7/3/2017 8 2 5

obs5 3/27/2021 9/13/2011 11/19/2021 2/4/2018 9 4 2 5

I am trying essentially remove all blanks per row and shift the dates and values to be "left justified" for lack of a better term. In other words, I want the resulting dataset to be

newdate1 newdate2 newdate3 newdate4 newvalue1 newvalue2 newvalue3 newvalue4

obs1 5/24/2019 11/18/2013 7/13/2012 10 1 8

obs2 8/30/2015 3/16/2019 1 2

obs3 12/14/2011 1

obs4 3/8/2023 11/4/2019 7/3/2017 8 2 5

obs5 3/27/2021 9/13/2011 11/19/2021 2/4/2018 9 4 2 5

To do this, I have the following code.
DATA DATA2; SET DATA;
NVALUES=N(OF DATE1-DATE315);
ARRAY DATENEW{50} DATENEW1-DATENEW50;
ARRAY DATE{315} DATE1-DATE315;
ARRAY VALUENEW{50} VALUENEW1-VALUENEW50;
ARRAY VALUE{315} VALUE1-VALUE315;
DO I=1 TO 315;
	IF DATE{I} NE . THEN DO J=1 TO NVALUES;
		DATENEW{J}=DATE{I}; VALUENEW{J}=VALUE{I};
	END;
END;
FORMAT DATENEW1-DATENEW50 MMDDYY10.;
RUN;
However, the date and output I am getting is only the last nonzero date and value, as below:

newdate1 newdate2 newdate3 newdate4 value 1 value 2 value 3 value 4

obs1 7/13/2012 7/13/2012 7/13/2012 8 8 8

obs2 3/16/2019 3/16/2019 2 2

obs3 12/14/2011 1

obs4 7/3/2017 7/3/2017 7/3/2017 5 5 5

obs5 2/4/2018 2/4/2018 2/4/2018 2/4/2018 5 5 5 5

Any help to correct this would be much appreciated.

View solution in original post

Reeza · Posted 11-03-2021 05:07 PM

Transpose to a long format
Delete missing
Sort
Transpose back

Or within a data step

Find number of non-missing entries -> N()
Loop through using LARGEST/SMALLEST to allocate to new locations
Not sure how this would handle ties

data long;
set data;
ARRAY _DATE{315} DATE1-DATE315;
ARRAY _VALUE{50} VALUE1-VALUE315;

do i=1 to dim(_date);
if not missing(_date(i)) then do;
Record = _n_;
Date = _date(i);
Value = _value(i);
output;
end;
end;
run;

proc sort data=long;
by record date value;
run;

proc transpose data=long out=date prefix=date;
by record;
var date;
run;

proc transpose data=long out=values prefix=values;
by record;
var Value;
run;

data want;
merge date values;
by record;
keep date1-date50 values1-value50 record;
run;
run;

@JonKetchup wrote:

I have 315 dates per observation where 0 to 50 of them per observation are non-missing, but where those non-missing dates are changes for each observation. Each date has a corresponding value. Below is a simpler example of the data structure with 10 dates and 10 corresponding values.

date1 date2 date3 date4 date5 date6 date7 date8 date9 date10 value1 value2 value3 value4 value5 value6 value7 value8 value9 value10

obs1 5/24/2019 11/18/2013 7/13/2012 10 1 8

obs2 8/30/2015 3/16/2019 1 2

obs3 12/14/2011 1

obs4 3/8/2023 11/4/2019 7/3/2017 8 2 5

obs5 3/27/2021 9/13/2011 11/19/2021 2/4/2018 9 4 2 5

I am trying essentially remove all blanks per row and shift the dates and values to be "left justified" for lack of a better term. In other words, I want the resulting dataset to be

newdate1 newdate2 newdate3 newdate4 newvalue1 newvalue2 newvalue3 newvalue4

obs1 5/24/2019 11/18/2013 7/13/2012 10 1 8

obs2 8/30/2015 3/16/2019 1 2

obs3 12/14/2011 1

obs4 3/8/2023 11/4/2019 7/3/2017 8 2 5

obs5 3/27/2021 9/13/2011 11/19/2021 2/4/2018 9 4 2 5

To do this, I have the following code.
DATA DATA2; SET DATA;
NVALUES=N(OF DATE1-DATE315);
ARRAY DATENEW{50} DATENEW1-DATENEW50;
ARRAY DATE{315} DATE1-DATE315;
ARRAY VALUENEW{50} VALUENEW1-VALUENEW50;
ARRAY VALUE{315} VALUE1-VALUE315;
DO I=1 TO 315;
	IF DATE{I} NE . THEN DO J=1 TO NVALUES;
		DATENEW{J}=DATE{I}; VALUENEW{J}=VALUE{I};
	END;
END;
FORMAT DATENEW1-DATENEW50 MMDDYY10.;
RUN;
However, the date and output I am getting is only the last nonzero date and value, as below:

newdate1 newdate2 newdate3 newdate4 value 1 value 2 value 3 value 4

obs1 7/13/2012 7/13/2012 7/13/2012 8 8 8

obs2 3/16/2019 3/16/2019 2 2

obs3 12/14/2011 1

obs4 7/3/2017 7/3/2017 7/3/2017 5 5 5

obs5 2/4/2018 2/4/2018 2/4/2018 2/4/2018 5 5 5 5

Any help to correct this would be much appreciated.

JonKetchup · Posted 11-03-2021 05:43 PM

Thanks for the help. The keep statement is not working, but that is not a big deal. However, this seems to be removing all observations that have no dates/values, and I need to preserve those. There are also other variables in the datasets I need to be preserved, like ID. How would I make those changes?

Reeza · Posted 11-03-2021 06:06 PM

@JonKetchup wrote:

Thanks for the help. The keep statement is not working, but that is not a big deal. However, this seems to be removing all observations that have no dates/values, and I need to preserve those.

Add a condition that if N values is 0 then just output at least one.

@JonKetchup wrote:

There are also other variables in the datasets I need to be preserved, like ID. How would I make those changes?

I used record as the identifier which I made. You can add variables to your KEEP statement in the transpose step and then add them to the BY statement if they help to uniquely identify a row.

Reeza · Posted 11-03-2021 06:30 PM

data have; 
  input ID Sex $1. (date1-date4) (:yymmdd.) value1-value4 ;
  format date1-date4 yymmdd10.;
cards;
1 F 2019-05-24 2019-05-24 2012-07-13          . 10 10 8 .
2 F 2015-08-30 2019-03-16          .          . 1 2 . .
3 M         . 2011-12-14          .          . . 1 . .
4 F 2023-03-08          . 2017-07-03          . 8 2 5 .
5 M 2021-03-27 2011-09-13          . 2018-02-04 9 4 . 5
         .          .          .          . . . . .
;


data long;
set have;
ARRAY _DATE{4} DATE1-DATE4;
ARRAY _VALUE{4} VALUE1-VALUE4;

do i=1 to dim(_date);
if not missing(_date(i)) then do;
Date = _date(i);
Value = _value(i);
output;
end;
end;

keep id sex date value;
run;

proc sort data=long;
by ID sex date value;
run;

proc transpose data=long out=date prefix=date;
by ID SEX;
var date;
run;

proc transpose data=long out=values prefix=values;
by ID SEX;
var Value;
run;

data want;
merge date values;
by ID SEX;
run;

pink_poodle · Posted 11-03-2021 05:23 PM

I will write pseudo-code for this. Keep only the ids and the dates. Take each row into a little dataset and delete all the missing date cells/columns. Rename the remaining date variables. Set the rows. Merge to the other part of the initial dataset (ids and the values).
Deleting missing date cells is tricky because SAS does not do conditional variable drops. The info how to do that is here:
https://sasexamplecode.com/how-to-identify-remove-empty-columns-in-sas/

pink_poodle · Posted 11-03-2021 05:36 PM

%let &nrows = ..; *number of rows in you dataset;
%macro new_rows; *makes new rows;
%do i = 1 %to &nrows;

1) little row data set;
data part1;
set have;
keep id date:;
if _N_=&i;
run;

2) delete empty cells;
ods select nlevels;
proc freq data=work.my_data nlevels;
ods output nlevels=work.nlevels_my_data;
run;

proc sql;
	select TableVar into :empty_columns separated by ","
	from part1
	where nnonmisslevels = 0;
quit;

proc sql;
	alter table part1
	drop &emptycolumns;
quit;

3) rename variables;
data part1;
set part1;
https://communities.sas.com/t5/SAS-Programming/Rename-all-variables/td-p/500828?nobounce;

4) saving one row to be set;
data row&i;
set part1;
run;

%mend t;

5) set all rows;
data whole;
%macro setter;
%do i=1 %to &nrows;
row&i
%end;
run;
%mend setter;

Kurt_Bremser · Posted 11-03-2021 05:29 PM

Transpose to a long layout, and keep only non-missing values. Extract the number from _name_ and use it to join the dates and values into a single dataset.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Tom · Posted 11-03-2021 06:09 PM

Just transpose it and then transpose it back.

First let's make up some data. For example lets us your example result and introduce some "gaps". Why not also introduce some inconsistency where sometimes there is a value but no date?

data have; 
  input (date1-date4) (:yymmdd.) value1-value4 ;
  format date1-date4 yymmdd10.;
cards;
2019-05-24 2013-11-18 2012-07-13          . 10 1 8 .
2015-08-30 2019-03-16          .          . 1 2 . .
         . 2011-12-14          .          . . 1 . .
2023-03-08          . 2017-07-03          . 8 2 5 .
2021-03-27 2011-09-13          . 2018-02-04 9 4 . 5
;

Now let's transpose. Probably easier to use a data step to do the first transpose since you have multiple variables. Since the index into the array has no meaning (we are going to throw the original position away) lets use the older DO OVER syntax to make typing the code easier. Let's make sure to introduce a unique row identifier if one does not exist.

data tall;
  row+1;
  set have;
  array _d date1-date4;
  array _v value1-value4;
  do over _d;
    date=_d;
    value=_v;
    if n(date,value) then output;
  end;
  format date yymmdd10.;
  drop date1-date4 value1-value4;
run;

Now to go from TALL back to wide we can use PROC SUMMARY. But we will need to enter an upperbound for the maximum number of date/value pair variables we want to create. We can hard code it to the original number, or calculate it from the TALL structure.

proc summary data=tall nway;
  by row ;
  output out=want(drop=_type_) idgroup (out[4] (date value)=);
run;

Original dateset.

Transposed TALL dataset.

Resulting new dataset

If you are worried about some rows having no values then add these lines to the end of the TALL data step.

call missing(date,value);
output;

If there are other variables you want carried into the output dataset add them to the BY statement in the PROC SUMMARY step. For example if you had ID and COMPANY variables your BY statement would look like.

by row id company ;

Reeza · Posted 11-03-2021 06:19 PM

And handles ties correctly, definitely a simpler approach.

Ksharp · Posted 11-04-2021 07:43 AM

data have; 
  input (date1-date4) (:yymmdd.) value1-value4 ;
  format date1-date4 yymmdd10.;
cards;
2019-05-24 2013-11-18 2012-07-13          . 10 1 8 .
2015-08-30 2019-03-16          .          . 1 2 . .
         . 2011-12-14          .          . . 1 . .
2023-03-08          . 2017-07-03          . 8 2 5 .
2021-03-27 2011-09-13          . 2018-02-04 9 4 . 5
;

data want;
 set have;
 array newdate{4};
 array newvalue{4};

 array d{*} date1-date4;
 array v{*} value1-value4;

j=0;
do i=1 to dim(d);
 if not missing(d{i}) then do;j+1;newdate{j}=d{i};end;
end;

j=0;
do i=1 to dim(d);
 if not missing(v{i}) then do;j+1;newvalue{j}=v{i};end;
end;

drop date1-date4 value1-value4 i j;
format newdate: yymmdd10.;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!