BookmarkSubscribeRSS Feed
AshJuri
Calcite | Level 5

I have a data set like this:

 

Subject          Office                   Date        Value      Output

Patient 1         Visit 1                 12             20           20

Patient 1          Followup 2          4             35           35-20

Patient 1          Followup 2          2             54           54-20

Patient 1          Followup 3          3            29            29-20

Patient 1          Followup 3          11          13           13-20

Patient 1          Followup 3           5            98           98-20

 

I am trying to use proc sql to subtract the numbers in the "VALUE" column based on the "SUBJECT" and "OFFICE" columns. For example, visit 1 is the baseline visit and it should be subtracted from the other visits. The result should form the "OUTPUT" column. Is this possible? I am open to other methods.

 

Thank you.

15 REPLIES 15
Kurt_Bremser
Super User

Easy with a data step.

data want;
set have;
by subject;
retain base;
if find(office,'Visit') = 1
then do;
  output = value;
  base = value;
end;
else output = value - base;
drop base;
run;
novinosrin
Tourmaline | Level 20
data have;
input Subject  & $15.        Office       & $15.                Date        Value ;*     Output;
cards;
Patient 1         Visit 1                 12             20           20
Patient 1          Followup 2          4             35           35-20
Patient 1          Followup 2          2             54           54-20
Patient 1          Followup 3          3            29            29-20
Patient 1          Followup 3          11          13           13-20
Patient 1          Followup 3           5            98           98-20
;

proc sql;
create table want(drop=rn) as
select *,ifn(Office= 'Visit 1',value,value-max((Office= 'Visit 1')*value)) as output 
from (select * ,monotonic() as rn from have)
group by subject
order by subject,rn;
quit;
AshJuri
Calcite | Level 5
Hi, I noticed that the ifn function isn't producing the output column when I execute this program. Do you have any recommendations?
novinosrin
Tourmaline | Level 20

Should your dataset be sorted in the order with VISIT 1 being the first observation for each subject as shown in the sample, you are certainly better off resorting to a datastep

 

data have;
input Subject  & $15.        Office       & $15.                Date        Value ;*     Output;
cards;
Patient 1         Visit 1                 12             20           20
Patient 1          Followup 2          4             35           35-20
Patient 1          Followup 2          2             54           54-20
Patient 1          Followup 3          3            29            29-20
Patient 1          Followup 3          11          13           13-20
Patient 1          Followup 3           5            98           98-20
;
data want;
 set have;
 by subject;
 if first.subject then do;
  _iorc_=value;
  output=_iorc_;
 end;
 else output=value-_iorc_;
run;

 

Tom
Super User Tom
Super User

Why SQL?  Why not just write normal SAS code?

proc sql;
create table WANT as 
select a.*
    , case when a.office = 'Visit 1' then a.value
           when 0=nmiss(a.value,b.value) then a.value-b.value 
           else .
      end as OUTPUT
from have a
left join have b
on a.subject = b.subject 
and b.office = 'Visit 1'
;
quit;
AshJuri
Calcite | Level 5

Hi everyone, here is a snippet of a data set I have named “general” :

 

Row          Subject   Type                 Procedure           Measurement      Output

1               500         Intial                  Invasive             20                        20

2               500          Initial                Surface              35                        35

3               500          Followup          Invasive             54                        54-20

4               428          Followup          Outer                 29                        29-10

5               765          Followup           Other                 13                       13-19

6                500          Followup          Surface              98                       98-35

7                428          Intial                 Outer                 10                        10                

8                765          Intial                 Other                 19                        19     

9                610          Third                Invasive              66                       66-17

10               610          Initial                Invasive             17                       17

 

The goal is to subtract the numbers in the "MEASUREMENT" column based on the "SUBJECT", "TYPE" and “PROCEDURE” columns. If two values in the “SUBJECT” column match and  two values in the “PROCEDURE” column match, then the initial measurement should be subtracted from the other measurement. For example, the initial measurement in row 1 (20) should be subtracted from the followup measurement in row 3 (54) because the subject (500) and procedure (Invasive) match. The result should form the "OUTPUT" column.

 

Thank you in advance!

AshJuri
Calcite | Level 5

Hi everyone, I have a data set named “Tentative”:

 

Row          Subject   Type                 Procedure           Measurement      Output

1               500         Intial                  Invasive             20                        20

2               500          Initial                Surface              35                        35

3               500          Followup          Invasive             54                        54-20

4               428          Followup          Outer                 29                        29-10

5               765          Followup           Other                 13                       13-19

6                500          Followup          Surface              98                       98-35

7                428          Intial                 Outer                 10                        10                

8                765          Intial                 Other                 19                        19     

9                610          Third                Invasive              66                       66-17

10               610          Initial                Invasive             17                       17

 

I was trying to use proc sql to subtract the numbers in the "MEASUREMENT" column based on the "SUBJECT", "TYPE" and “PROCEDURE” columns. If two values in the “SUBJECT” column match and  two values in the “PROCEDURE” column match, then the initial measurement should be subtracted from the other measurement. For example, the initial measurement in row 1 (20) should be subtracted from the followup measurement in row 3 (54) because the subject (500) and procedure (Invasive) match. The result should form the "OUTPUT" column.

 

Thank you in advance!

mkeintz
PROC Star

Could you put this in a working DATA step?  I see that you haven't even consistently spelled "Initial".  If you provided your starting data in a data step, it would be a lot easier, and more reliable for us to help you.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Shmuel
Garnet | Level 18

1) Replace TYPE by a sequential number or add a new variable (TypeN)

2) Sort data by Subject TypeN

3) Execute data step BY Subject; Retain Initial values then at any other observation of same subject subtract values as desired.

 

Something like:

 

data temp
 set have;
      if Type = 'Initial'       then TypeN = 0 ; else
      if Type = 'Followup' then TypeN = 1 ; else
      if Type = 'Third'       then TypeN = 2 ;
run;
proc sort data=temp; bu subject TypeN;

data want(drop=measure1);
set temp;
  by subject;
       retain measure1;
       if first.subject and TypeN=0  then measure1 = measurement;
       else measurement = measurement  - measure1;
run;    
       

 

AshJuri
Calcite | Level 5

Hi everyone, here is a portion of a data set I have named “antibody” :

 

Row          Subject   Type                 Procedure           Measurement      Output

1               500         Intial                  Invasive             20                        20

2               500          Initial                Surface              35                        35

3               500          Followup          Invasive             54                        54-20

4               428          Followup          Outer                 29                        29-10

5               765          Seventh           Other                 13                       13-19

6                500          Followup          Surface              98                       98-35

7                428          Initial                 Outer                 10                        10                

8                765          Initial                 Other                 19                        19     

9                610          Third                Invasive              66                       66-17

10               610          Initial                Invasive             17                       17

 

The goal is to subtract the numbers in the "MEASUREMENT" column based on the "SUBJECT", "TYPE" and “PROCEDURE” columns. If two values in the “SUBJECT” column match and  two values in the “PROCEDURE” column match, then the initial measurement should be subtracted from the other measurement. For example, the initial measurement in row 1 (20) should be subtracted from the followup measurement in row 3 (54) because the subject (500) and procedure (Invasive) match. Furthermore, the initial measurement in row 8 (19) should be subtracted from the seventh measurement in row 5 (13) because the subject (765) and procedure (Other) match. The result should form the "OUTPUT" column.

 

Thank you in advance!

Tom
Super User Tom
Super User

Why do you keep asking the same question?

Did you try the answers on your other questions?

AshJuri
Calcite | Level 5
Hi, I apologize. I did try the recommendations but they did not work for the updated question that I posted. I thought it'd be okay to pose the question once more.
ballardw
Super User

When adding additional elements to a question it is appropriate to ask a new question.

 

To get the best response it would be a good idea to show:

  1. The code that you use that is working for the previous solution.
  2. Clearly identify the changed elements of the problem. Which may require some consideration as "list by example" is not quite the same as providing an actual rule or rules. It may help to show what you attempted for the new solution. Sometimes people are pretty close just missing one detail. You'll learn more as well.
  3. Make sure that any provided data step example data runs (Hint: post code in code boxes, the windows on the forum often reformat text and may not run when pasted into a main message window. Code boxes can be opened using the </> or "running man" icons)
  4. Make sure to reference the names of variables in narratives and do not make us guess if something like "ID" is supposed to be the same as "patient" or similar.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 7662 views
  • 2 likes
  • 7 in conversation