BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rajd1
Quartz | Level 8

Hello there, 

I have been trying to work on this problem with data manipulation using SQL. Here is my program and my data set:

IDCycleDelayReductionDrugNameRegimenDose
11NoneNoneCarbx,y750
11NoneyesPemx,y1015
12NoneNonePemx,y1015
12NoneNoneCarbx,y705
11yesNoneDurvx940
12NoneNoneDurvx940
13NoneNoneDurvx930
14yesNoneDurvx910
15NoneNoneDurvx910
16NoneNoneDurvx910
17NoneNoneDurvx940
18NoneNoneDurvx945
19NoneyesDurvx920
110NoneNoneDurvx915
111NoneNoneDurvx915
112UnknownNoneDurvx920
113NoneNoneDurvx920
114NoneNoneDurvx920
115NoneyesDurvx925
116NoneNoneDurvx910
117NoneNoneDurvx890
118NoneNoneDurvx875
119NoneNoneDurvx870
120NoneNoneDurvx850
121NoneNoneDurvx850
122NoneNoneDurvx845
123NoneNoneDurvx860
124NoneNoneDurvx855
21NoneNonePaclx,z95
21NoneNoneCarbx,z245
22NoneNonePaclx,z95
22NoneNoneCarbx,z225
23NoneNonePaclx,z95
23NoneOtherCarbx,z225

 

Here is my program:

Proc sql;
Create table Drug_Clean AS
select distinct (DrugName),
ID as ID, max(cycle) as maxcycle, min(cycle) as mincycle, Count(Dose) as numberdoses, First(dose) as Firstdose, mean(Dose) as MeanDose, last(Dose) as LastDose
from L_line1
Group by SubjectID;
quit;

 

Here is the final dataset that i would like to have the following:

It should have all the drugnames listed by id: For exampl ID #1 has carb, pem and durv. I want to display as follows:

 

IDDrugMaxCycleMinCyclecountdoseAvgdoseFirstDoseLastDoseReductionDelay
1Durv24124902.7940855yesyes

 

Any help is greatly appreciated!!!

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Like this?

proc sql;
  create table WANT as
  select unique ID
         , DRUGNAME
         , MAXCYCLE
         , MINCYCLE
         , NUMBERDOSES
         , MEANDOSE   
         , max(FD) as FIRSTDOSE
         , max(LD) as LASTDOSE
  from (
    select unique ID
         , DRUGNAME
         , max (CYCLE) as MAXCYCLE
         , min (CYCLE) as MINCYCLE
         , count(DOSE) as NUMBERDOSES
         , mean(DOSE)  as MEANDOSE
         , DOSE*(CYCLE=calculated MINCYCLE) as FD
         , DOSE*(CYCLE=calculated MAXCYCLE) as LD
    from HAVE
    group by ID, DRUGNAME
  )
  group by ID, DRUGNAME
  order by ID, DRUGNAME
  ;
quit;

 

View solution in original post

7 REPLIES 7
ChrisNZ
Tourmaline | Level 20

Why is there no Carb in the output?

rajd1
Quartz | Level 8

Sorry, Yes I want it to display all the carb and pem as well. Basically all the drugnames by ID.

Thanks

ChrisNZ
Tourmaline | Level 20

Like this?

proc sql;
  create table WANT as
  select unique ID
         , DRUGNAME
         , MAXCYCLE
         , MINCYCLE
         , NUMBERDOSES
         , MEANDOSE   
         , max(FD) as FIRSTDOSE
         , max(LD) as LASTDOSE
  from (
    select unique ID
         , DRUGNAME
         , max (CYCLE) as MAXCYCLE
         , min (CYCLE) as MINCYCLE
         , count(DOSE) as NUMBERDOSES
         , mean(DOSE)  as MEANDOSE
         , DOSE*(CYCLE=calculated MINCYCLE) as FD
         , DOSE*(CYCLE=calculated MAXCYCLE) as LD
    from HAVE
    group by ID, DRUGNAME
  )
  group by ID, DRUGNAME
  order by ID, DRUGNAME
  ;
quit;

 

rajd1
Quartz | Level 8
This worked GREAT! Thanks so much!
Quick question: say i wanted to have one row for each ID and have drug and cycle listed as variables like Drug1, Drug2, and so on. How can i modify this code? I tried a few things and it didn't work. I am soon realizing the power of SQL. This is a great community and i have been using this everyday to learn:)
ChrisNZ
Tourmaline | Level 20
SQL is not very suited to the first task (first and last row), and even less suited to what you want now (create an unknown number of columns). 1. These 2 tasks are best performed in a data step 2. The structure you are requesting is ill-suited to any kind of processing. It's much better to keep the data vertical.
ed_sas_member
Meteorite | Level 14

Hi @rajd1 

 

Here is a way to achieve this.

Could you please clarify the rule to compute delay and reduction?

In the following program, I have set reduction and delay to "yes" if there was at least one record equal to "yes", and "none" in other cases.

However, I can see that there is a value "Other" for patient 2 at cycle 3. How would you like to handle this case?

 

Best,

proc sql;
	select distinct ID, DrugName, MaxCycle, MinCycle, countdose, Avgdose, max(FirstDose) as FirstDose, max(LastDose) as LastDose, Delay, Reduction
	from (select ID, DrugName,
		   max(cycle) as MaxCycle,
		   min(cycle) as MinCycle,
		   count(dose) as countdose,
		   mean(dose) as Avgdose format=8.1,
		   dose*(cycle=calculated MinCycle) as FirstDose,
		   dose*(cycle=calculated MaxCycle) as LastDose,
		   case when sum(lowcase(delay)="yes") > 1 then "yes" else "none" end as Delay,
		   case when sum(lowcase(reduction)="yes") > 1 then "yes" else "none" end as Reduction /*what if "Other" ?*/
	 	 from L_line1
		 group by ID, DrugName)
	group by ID, DrugName	
	order by ID, DrugName;
quit;

Capture d’écran 2020-05-29 à 09.58.53.png

rajd1
Quartz | Level 8

This worked GREAT! The delay variable has 3 categories: none, yes, and unknown. I would like to make these as categories for the delay variable as well. 

basically i want to be able to add the 3rd category to the delay variable. 

Further i have some dates and was trying to find the difference between dates.

i was working with and here is the code:

proc sql;
create table clean_Drug as
select unique SubjectID
, DRUGNAME
, REGIMEN
, STARTCYCLE
, ENDCYCLE
, CYCLEDAYS
, STARTLOT
, ENDLOT
, MAXCYCLE
, MINCYCLE
, AVGCYCLE
, MEDIANCYCLE
, NUMBERDOSES
, MEANDOSE
, max(FIRSTDOSE) as FIRSTDOSE
, max(LASTDOSE) as LASTDOSE
from (
select unique SubjectID
, DRUGNAME
, REGIMEN
, min (LotStart) as STARTLOT format MMDDYY10.
, max (LotEnd) as ENDLOT format MMDDYY10.
, min (AdminDate) as STARTCYCLE format MMDDYY10.
, max (AdminDate) as ENDCYCLE format MMDDYY10.
, max (CYCLE) as MAXCYCLE
, sum (ENDCYCLE,-STARTCYCLE) as CYCLEDAYS
, min (CYCLE) as MINCYCLE
, mean (CYCLE) as AVGCYCLE
, median(CYCLE) as MEDIANCYCLE
, count(DOSE) as NUMBERDOSES
, mean(DOSE) as MEANDOSE
, DOSE*(CYCLE=calculated MINCYCLE) as FIRSTDOSE
, DOSE*(CYCLE=calculated MAXCYCLE) as LASTDOSE
from L_Line1
group by SubjectID, DRUGNAME)
group by SubjectID, DRUGNAME
order by SubjectID, DRUGNAME
;
quit;

I got the following error:

ERROR: Function SUM requires a numeric expression as argument 1.
ERROR: The following columns were not found in the contributing tables: ENDCYCLE, STARTCYCLE.

 

Thanks so much! 

 

 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1082 views
  • 0 likes
  • 3 in conversation