BookmarkSubscribeRSS Feed
queenamaka
Calcite | Level 5

Hi All,

please can this programme below be converted to Proc SQL

 

Data A;

Set B;

 

retain EXPC1FL EXPC2FL EXPC3FL TRTPFSDT TRTPLSDT EXPDOSE EXPCYCN;
format EXPC1FL EXPC2FL EXPC3FL $1. TRTPFSDT TRTPLSDT DATE9. EXPDUR EXPCYCN EXPDOSE 8. EXPCOMP 4.1;
by usubjid visitnum;
if first.usubjid then do;
EXPC1FL=' '; EXPC2FL=' '; EXPC3FL=' ';
TRTPFSDT=.; TRTPLSDT=.; EXPDOSE=0; EXPCYCN=0;
end;
EXPDOSE=EXPDOSE+ECDOSE;
EXPCYCN=EXPCYCN+1;
if visitnum=10 then EXPC1FL='Y';
if visitnum=20 then EXPC2FL='Y';
if visitnum=30 then EXPC3FL='Y';
if visitnum=10 then TRTPFSDT=TRTPSDT;
if last.usubjid then TRTPLSDT=TRTPEDT;
if last.usubjid;
EXPCOMP=(EXPDOSE/250)*40;
EXPDUR=(TRTPLSDT - TRTPFSDT)+1;

 

regards

Queen

11 REPLIES 11
Oligolas
Barite | Level 11

Hi,

it's possible, but it wouldn't be so readable and easy to validate as the program you already have. So I can't see why it would make sense right now.

Could you please explain why you would like to do this?

It would also be great if you could provide some data to play with.

________________________

- Cheers -

andreas_lds
Jade | Level 19

Please use the "Insert SAS Code" button to insert formatted sas code.

The concept of first/last observation in a group does not exist in sql.

queenamaka
Calcite | Level 5
Thank you Andreas,

I have searched the concept of By-group processing in SQL and it seems its non-existence that''s why i have to post this question.

I will continue using the datastep then.
gamotte
Rhodochrosite | Level 12

SQL has the "GROUP BY" statement that allows to use aggregation functions on subgroups of the dataset.

Here is a simplified exemple based on your data step (not tested as you did not provide sample data in

the form of a datastep) :

proc sql noprint;
    CREATE TABLE want AS
    AS SELECT DISTINCT count(*) AS EXPCYCN, sum(ECDOSE) AS EXPDOSE,
              EXPC1FL=ifc(visitnum=10,'Y',' ') format=$1.
    FROM have
    GROUP BY usubjid
    ;
quit;

 EDIT : Added "DISTINCT" to keep only one row by aggregate (assuming visitnum is constant for any given usubjid).

Tom
Super User Tom
Super User

This line seems strange to me.

 EXPC1FL=ifc(visitnum=10,'Y',' ') format=$1.

Why are trying to test if the existing variable EXPC1FL is equal to either the letter Y or a blank string?  Why didn't you give the newly calculated variable a name? Why did you try to attach a character format to a numeric (boolean) value?

 

Also if you did mean to make a character variable why do you feel a need to attach the $1. format to limit the number of characters displayed when the variable is printed? Why not just set the length of the variable to 1?

gamotte
Rhodochrosite | Level 12
@Tom
You're right, it was a carelessness mistake. I meant "ifc(... ) AS EXPC1FL".
gamotte
Rhodochrosite | Level 12
I also agree with your second point about the length though i think the result would be the same, but i cannot test it right now.
Tom
Super User Tom
Super User

With that code neither is needed.

But attaching $ formats to characters is a dangerous practice.

Example: What output do you expect from this program?

 

data one ;
  input study :$10. @@;
cards;
ABCD1234 ABCD1234B ABCD1234C
;
data two ;
   format study $8. ;
   input study @@;
cards;
WXYZ0001 WXYZ0002
;
data both;
  set one two;
run;
proc freq ;
 tables study;
run;

Results:

Spoiler
The FREQ Procedure

                                     Cumulative    Cumulative
study       Frequency     Percent     Frequency      Percent
-------------------------------------------------------------
ABCD1234           3       60.00             3        60.00
WXYZ0001           1       20.00             4        80.00
WXYZ0002           1       20.00             5       100.00

Note this is simplified version of a real problem that I have seen.

 

gamotte
Rhodochrosite | Level 12
Yes, i agree with you, unless truncation is a desired feature, applying a $ format to a character variable can be error prone.
My point was to show that columns attributes could be defined for new variables created by a proc sql and i used the format as in the original data step as an example.
ballardw
Super User

SQL and "sequence" of data do not play well.

So anything that uses Retain statements or Lag and Dif functions is very difficult to translate to SQL. It also typically becomes much more convoluted code that may be a tad fragile.

PGStats
Opal | Level 21
proc sql;
create table A as
select
	usubjid,
	sum(ecdose) as expdose,
	count(*) as expcycn,
	calculated expdose / 250 * 40 as expcomp,
	case when visitnum=10 then "Y" else " " end as expc1fl,
	case when visitnum=20 then "Y" else " " end as expc2fl,
	case when visitnum=30 then "Y" else " " end as expc3fl,
    case when visitnum=10 then trtpsdt else . end as trtpfsdt,
	trtpedt as trtplsdt,
	calculated trtplsdt - calculated trtpfsdt + 1 as expdur
from B
group by usubjid
having visitnum=max(visitnum);
quit;
PG

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 2204 views
  • 4 likes
  • 7 in conversation