Solved: Re: Adding A Row Based on Existing Variables

SAS_learneromg · Posted 03-29-2022 05:37 PM

So I have a dataset of values for ta est result that come in two parts, Components A and B.

In each row I have these variables: ID Date Test_no TestName Test Result.

In short, I want to add a new row for the overall test result, which would be yes if either Component A or B= yes

So what I have:

ID Date Test_no TestName Test Result.

1 3/29/22 1 Component A Yes

1 3/29/22 2 Component B No

And what I want:

ID Date Test_no TestName Test Result.

1 3/29/22 1 Component A Yes

1 3/29/22 2 Component B No

1 3/29/22 3 Overall Yes

I've never done anything like this so not exactly sure where to start.

ballardw · Posted 03-29-2022 06:17 PM

Some details: such as is this 1) only for two rows? 2) Always exactly two rows? Are there ever any other TestName other than Component A or Component B? (if only these two then code can be much simpler). What is "inserted" if none of the tests are 'Yes'?

I have to guess that you have more than one ID. Is that correct?

RETAIN is a function that allows you to keep the value of a variable across boundaries of the data step. So you would need to create and keep that and then after the LAST id replace the test result with that retained value.

It helps to provide an example of your data as data step code so we test with what you actually have. For example, I am assuming that you actually have SAS date values.

This works for MY interpretation of your limited data and should work for more than 1 id. Maybe.

data have;
   infile datalines dlm=',';
   input ID     Date :mmddyy. Test_no TestName :$12. TestResult $ ;
   format date mmddyy10.;
datalines; 
1,3/29/22,1,Component A,Yes
1,3/29/22,2,Component B,No
;
/* not actually in order 
Proc sort data=have;
   by id date;
run;
*/
data want;
   set have;
   by id;
   retain flag;
   if first.id then flag=.;
   if missing(flag) then do;
       if (testname='Component A' and testresult='Yes') or
          (testname='Component B' and testresult='Yes') then flag=1;
   end;
   if last.id then do;
      if flag=1 then do;
         output; /* write the current last record*/
         /* now add the new stuff*/
         test_no=test_no+1;
         testname= 'Overall';
         testresult='Yes';
         output;
      end;
   end;
   else output;
   drop flag;
run;

In addition to the Retain function the BY statement creates additional automatic variables that indicate that a current record is the first or last of a by group. These are 1/0 (true/false) values and referenced with the First. and Last. preceding the variable name (no space and do pay attention to the dot). So you can take actions on the first, last or other records of a group. If there is a single record then both First and Last are true.

The Output statement writes data to the output data set when encountered. So we write the current record when it is not the last of the group, when it is the last then we write the record and if the flag variable has been set do the "insert" you requested.

We use the FIRST.Id to reset the flag to missing so you have a fresh value when when the next ID is encountered.

Note that if you only have Component A and B tests then only a test for the TestResult is needed.

Aside: Bad juju accrues to those you use 2 digit years.

View solution in original post

pink_poodle · Posted 03-29-2022 05:57 PM

In pseudo code, you would need to 1) create a dataset where TestName = Overall and TestResult is Yes for IDs with any Yes’s, and No for all other IDs and 2) append that dataset to what you have.

In real code:
Data yes;
Set have;
If TestResult = “Yes”;
TestName = “Overall”;
Run;

Proc sql;
Create table no
Select *
From have
Where id not in (select id from yes)
;
Run;

Proc sort data=No;
By id;
Run;

Data no;
Set no;
By id;
If first.id;
TestName = “Overall”;
Run;

Data want;
Set have yes no;
Run;

Proc sort data=want;
By id date test_no;
Run;

ballardw · Posted 03-29-2022 06:17 PM

Some details: such as is this 1) only for two rows? 2) Always exactly two rows? Are there ever any other TestName other than Component A or Component B? (if only these two then code can be much simpler). What is "inserted" if none of the tests are 'Yes'?

I have to guess that you have more than one ID. Is that correct?

RETAIN is a function that allows you to keep the value of a variable across boundaries of the data step. So you would need to create and keep that and then after the LAST id replace the test result with that retained value.

It helps to provide an example of your data as data step code so we test with what you actually have. For example, I am assuming that you actually have SAS date values.

This works for MY interpretation of your limited data and should work for more than 1 id. Maybe.

data have;
   infile datalines dlm=',';
   input ID     Date :mmddyy. Test_no TestName :$12. TestResult $ ;
   format date mmddyy10.;
datalines; 
1,3/29/22,1,Component A,Yes
1,3/29/22,2,Component B,No
;
/* not actually in order 
Proc sort data=have;
   by id date;
run;
*/
data want;
   set have;
   by id;
   retain flag;
   if first.id then flag=.;
   if missing(flag) then do;
       if (testname='Component A' and testresult='Yes') or
          (testname='Component B' and testresult='Yes') then flag=1;
   end;
   if last.id then do;
      if flag=1 then do;
         output; /* write the current last record*/
         /* now add the new stuff*/
         test_no=test_no+1;
         testname= 'Overall';
         testresult='Yes';
         output;
      end;
   end;
   else output;
   drop flag;
run;

In addition to the Retain function the BY statement creates additional automatic variables that indicate that a current record is the first or last of a by group. These are 1/0 (true/false) values and referenced with the First. and Last. preceding the variable name (no space and do pay attention to the dot). So you can take actions on the first, last or other records of a group. If there is a single record then both First and Last are true.

The Output statement writes data to the output data set when encountered. So we write the current record when it is not the last of the group, when it is the last then we write the record and if the flag variable has been set do the "insert" you requested.

We use the FIRST.Id to reset the flag to missing so you have a fresh value when when the next ID is encountered.

Note that if you only have Component A and B tests then only a test for the TestResult is needed.

Aside: Bad juju accrues to those you use 2 digit years.

Tommer · Posted 03-29-2022 09:25 PM

data case1;
input id date test_no testname $10. result$;

informat date mmddyy10.;
datalines;
1 3/29/22 1 ComponentA Yes
1 3/29/22 2 ComponentB No
2 3/29/22 1 ComponentA No
2 3/29/22 2 ComponentB No
;
run;
/*add a logic to know if overall shd be yes or no by assigning numeric equivalents which can be added up.*/
data case2;
set case1;
if result = "Yes" then res1 = 1;
if result = "No" then res1 = 0;
run;

proc sql;
create table case3 as select id, date format=mmddyy10., 3 as test_no, "Overall" as testname, sum(res1) as res2 from case2 group by id,date;
quit;

data case4;
set case1 case3;
by id date test_no;
if res2 =>1 then result = "Yes";
else if res2=0 then result = "No";
drop res2;
run;

SAS_learneromg · Posted 03-30-2022 12:08 PM

Great, thank you! This has mostly done the trick except it seems like for records where Component A and B are yes, a row for the overall result appears twice. I can correct this later, but is there a way to make sure it doesn't happen in this step?

mkeintz · Posted 03-29-2022 11:08 PM

Assuming HAVE is sorted by ID.

After all the records for a given ID (first pass), read records for the same ID that have test result='Yes' - if any (second pass).

Keep all first pass records, and just one (if any) second pass record:

data have;
   infile datalines dlm=',';
   input ID     Date :mmddyy. Test_no TestName :$12. TestResult $ ;
   format date mmddyy10.;
datalines; 
1,3/29/22,1,Component A,Yes
1,3/29/22,2,Component B,No
run;

data want;
  set have (in=firstpass)
      have (in=secondpass drop=test_no testname where=(testresult='Yes'));
  by id;

  /* Now keep all first pass observations, and just one (the last) second pass obs */
  if firstpass or (secondpass and last.id);
  test_no=coalesce(test_no,sum(1,lag(test_no)));
  if testname=' ' then testname="Overall";
run;

Note the second pass does NOT read in values for TEST_NO or TESTNAME (see the "drop=" parameter). Those variables will therefore have missing values for any qualifying second pass observation. When TEST_NO is missing add 1 to the immediate preceding TEST_NO. When testname is missing, set it to "Overall".

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Classroom Training Available!