- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello there,
I have been trying to work on this problem with data manipulation using SQL. Here is my program and my data set:
ID | Cycle | Delay | Reduction | DrugName | Regimen | Dose |
1 | 1 | None | None | Carb | x,y | 750 |
1 | 1 | None | yes | Pem | x,y | 1015 |
1 | 2 | None | None | Pem | x,y | 1015 |
1 | 2 | None | None | Carb | x,y | 705 |
1 | 1 | yes | None | Durv | x | 940 |
1 | 2 | None | None | Durv | x | 940 |
1 | 3 | None | None | Durv | x | 930 |
1 | 4 | yes | None | Durv | x | 910 |
1 | 5 | None | None | Durv | x | 910 |
1 | 6 | None | None | Durv | x | 910 |
1 | 7 | None | None | Durv | x | 940 |
1 | 8 | None | None | Durv | x | 945 |
1 | 9 | None | yes | Durv | x | 920 |
1 | 10 | None | None | Durv | x | 915 |
1 | 11 | None | None | Durv | x | 915 |
1 | 12 | Unknown | None | Durv | x | 920 |
1 | 13 | None | None | Durv | x | 920 |
1 | 14 | None | None | Durv | x | 920 |
1 | 15 | None | yes | Durv | x | 925 |
1 | 16 | None | None | Durv | x | 910 |
1 | 17 | None | None | Durv | x | 890 |
1 | 18 | None | None | Durv | x | 875 |
1 | 19 | None | None | Durv | x | 870 |
1 | 20 | None | None | Durv | x | 850 |
1 | 21 | None | None | Durv | x | 850 |
1 | 22 | None | None | Durv | x | 845 |
1 | 23 | None | None | Durv | x | 860 |
1 | 24 | None | None | Durv | x | 855 |
2 | 1 | None | None | Pacl | x,z | 95 |
2 | 1 | None | None | Carb | x,z | 245 |
2 | 2 | None | None | Pacl | x,z | 95 |
2 | 2 | None | None | Carb | x,z | 225 |
2 | 3 | None | None | Pacl | x,z | 95 |
2 | 3 | None | Other | Carb | x,z | 225 |
Here is my program:
Proc sql;
Create table Drug_Clean AS
select distinct (DrugName),
ID as ID, max(cycle) as maxcycle, min(cycle) as mincycle, Count(Dose) as numberdoses, First(dose) as Firstdose, mean(Dose) as MeanDose, last(Dose) as LastDose
from L_line1
Group by SubjectID;
quit;
Here is the final dataset that i would like to have the following:
It should have all the drugnames listed by id: For exampl ID #1 has carb, pem and durv. I want to display as follows:
ID | Drug | MaxCycle | MinCycle | countdose | Avgdose | FirstDose | LastDose | Reduction | Delay |
1 | Durv | 24 | 1 | 24 | 902.7 | 940 | 855 | yes | yes |
Any help is greatly appreciated!!!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Like this?
proc sql;
create table WANT as
select unique ID
, DRUGNAME
, MAXCYCLE
, MINCYCLE
, NUMBERDOSES
, MEANDOSE
, max(FD) as FIRSTDOSE
, max(LD) as LASTDOSE
from (
select unique ID
, DRUGNAME
, max (CYCLE) as MAXCYCLE
, min (CYCLE) as MINCYCLE
, count(DOSE) as NUMBERDOSES
, mean(DOSE) as MEANDOSE
, DOSE*(CYCLE=calculated MINCYCLE) as FD
, DOSE*(CYCLE=calculated MAXCYCLE) as LD
from HAVE
group by ID, DRUGNAME
)
group by ID, DRUGNAME
order by ID, DRUGNAME
;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why is there no Carb in the output?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, Yes I want it to display all the carb and pem as well. Basically all the drugnames by ID.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Like this?
proc sql;
create table WANT as
select unique ID
, DRUGNAME
, MAXCYCLE
, MINCYCLE
, NUMBERDOSES
, MEANDOSE
, max(FD) as FIRSTDOSE
, max(LD) as LASTDOSE
from (
select unique ID
, DRUGNAME
, max (CYCLE) as MAXCYCLE
, min (CYCLE) as MINCYCLE
, count(DOSE) as NUMBERDOSES
, mean(DOSE) as MEANDOSE
, DOSE*(CYCLE=calculated MINCYCLE) as FD
, DOSE*(CYCLE=calculated MAXCYCLE) as LD
from HAVE
group by ID, DRUGNAME
)
group by ID, DRUGNAME
order by ID, DRUGNAME
;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Quick question: say i wanted to have one row for each ID and have drug and cycle listed as variables like Drug1, Drug2, and so on. How can i modify this code? I tried a few things and it didn't work. I am soon realizing the power of SQL. This is a great community and i have been using this everyday to learn:)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @rajd1
Here is a way to achieve this.
Could you please clarify the rule to compute delay and reduction?
In the following program, I have set reduction and delay to "yes" if there was at least one record equal to "yes", and "none" in other cases.
However, I can see that there is a value "Other" for patient 2 at cycle 3. How would you like to handle this case?
Best,
proc sql;
select distinct ID, DrugName, MaxCycle, MinCycle, countdose, Avgdose, max(FirstDose) as FirstDose, max(LastDose) as LastDose, Delay, Reduction
from (select ID, DrugName,
max(cycle) as MaxCycle,
min(cycle) as MinCycle,
count(dose) as countdose,
mean(dose) as Avgdose format=8.1,
dose*(cycle=calculated MinCycle) as FirstDose,
dose*(cycle=calculated MaxCycle) as LastDose,
case when sum(lowcase(delay)="yes") > 1 then "yes" else "none" end as Delay,
case when sum(lowcase(reduction)="yes") > 1 then "yes" else "none" end as Reduction /*what if "Other" ?*/
from L_line1
group by ID, DrugName)
group by ID, DrugName
order by ID, DrugName;
quit;
- Tags:
- proc sql
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This worked GREAT! The delay variable has 3 categories: none, yes, and unknown. I would like to make these as categories for the delay variable as well.
basically i want to be able to add the 3rd category to the delay variable.
Further i have some dates and was trying to find the difference between dates.
i was working with and here is the code:
proc sql;
create table clean_Drug as
select unique SubjectID
, DRUGNAME
, REGIMEN
, STARTCYCLE
, ENDCYCLE
, CYCLEDAYS
, STARTLOT
, ENDLOT
, MAXCYCLE
, MINCYCLE
, AVGCYCLE
, MEDIANCYCLE
, NUMBERDOSES
, MEANDOSE
, max(FIRSTDOSE) as FIRSTDOSE
, max(LASTDOSE) as LASTDOSE
from (
select unique SubjectID
, DRUGNAME
, REGIMEN
, min (LotStart) as STARTLOT format MMDDYY10.
, max (LotEnd) as ENDLOT format MMDDYY10.
, min (AdminDate) as STARTCYCLE format MMDDYY10.
, max (AdminDate) as ENDCYCLE format MMDDYY10.
, max (CYCLE) as MAXCYCLE
, sum (ENDCYCLE,-STARTCYCLE) as CYCLEDAYS
, min (CYCLE) as MINCYCLE
, mean (CYCLE) as AVGCYCLE
, median(CYCLE) as MEDIANCYCLE
, count(DOSE) as NUMBERDOSES
, mean(DOSE) as MEANDOSE
, DOSE*(CYCLE=calculated MINCYCLE) as FIRSTDOSE
, DOSE*(CYCLE=calculated MAXCYCLE) as LASTDOSE
from L_Line1
group by SubjectID, DRUGNAME)
group by SubjectID, DRUGNAME
order by SubjectID, DRUGNAME
;
quit;
I got the following error:
ERROR: Function SUM requires a numeric expression as argument 1.
ERROR: The following columns were not found in the contributing tables: ENDCYCLE, STARTCYCLE.
Thanks so much!