Solving the LinkedIn Queens Puzzle with PROC OPTMODEL
Recent Library Articles
Recently in the SAS Community Library: SAS' @SubbuPaz and @RobPratt formulate the LinkedIn Queens Puzzle as an integer programming model and solve it using PROC OPTMODEL in SAS Optimization.
Hi guys,
suppose to have the following table:
data DB;
input ID :$20. Admission :date09. Discharge :date09. Age_class Age_end Index Value Total;
format Admission date9. Discharge date9.;
cards;
0001 01JUL2014 16AUG2014 1 4 1 2.3 11.9
0001 13MAY2018 22JUN2018 3 4 0 1.4 .
0001 23JAN2019 25JAN2019 4 4 0 3.2 .
0002 13MAY2016 22SEP2016 1 5 1 2 7.9
0002 09JUL2023 10JUL2023 2 5 0 0.3 .
0002 12SEP2024 15SEP2024 3 5 0 0.2 .
0003 01JUL2014 18AUG2014 1 3 1 12 17.3
0003 07DEC2023 16DEC2023 2 3 0 0.3 .
0004 12JAN2014 15JAN2014 1 2 1 2 2.1
0004 30MAY2019 13JUL2019 2 2 0 0.1 .
0005 30JUN2019 13OCT2019 5 5 0 4.1 .
;
run;
Is there a way to get the following?
data DB1;
input ID :$20. Admission :date09. Discharge :date09. Age_class Age_end Index Value Total Age_class1 Age_class2 Age_class3 Age_class4 Age_class5;
format Admission date9. Discharge date9.;
cards;
0001 01JUL2014 16AUG2014 1 4 1 2.3 11.9 2.3 5 1.4 3.2 .
0001 13MAY2018 22JUN2018 3 4 0 1.4 . . . . . .
0001 23JAN2019 25JAN2019 4 4 0 3.2 . . . . . .
0002 13MAY2016 22SEP2016 1 5 1 2 7.9 2 0.3 0.2 5 0.4
0002 09JUL2023 10JUL2023 2 5 0 0.3 . . . . . .
0002 12SEP2024 15SEP2024 3 5 0 0.2 . . . . . .
0003 01JUL2014 18AUG2014 1 3 1 12 17.3 12 0.3 5 . .
0003 07DEC2023 16DEC2023 2 3 0 0.3 . . . . . .
0004 12JAN2014 15JAN2014 1 2 1 2 2.1 2 0.1 . . .
0004 30MAY2019 13JUL2019 2 2 0 0.1 . . . . . .
0005 30JUN2019 13OCT2019 5 5 1 4.1 4.1 . . . . 4.1
;
run;
In other words, for each ID in DB there is the Age_class at each admission and an Age_class at exit (Age_end) from the study. Then, there is the "Value" variable that is the time the patient stay in the corresponding Age_class (before exit). There is also a "Total" time in the study. What I need is to transform the dataset DB into a wide format where there are columns corresponding to the age classes filled by the corresponding Value. Note that in this new dataset the row-wise total must be reached and must be equal to "Total" and so: if there is a jump in the age classes (Age_class variable) a value = 5 years must be added because the age-class interval is of max 5 years.
Finally, in the new wide dataset (DB1) rows must be filled only where Index = 1
Can anyone help me please?
Thank you very much in advance
... View more
Hello everyone, When modeling with proc mixed in SAS studio, in the Random effects builder, what is the difference when the variable in the model effects, is in the intercept box, subject effect (covariance structures), or group effect, please?
... View more
I have a large health plan dataset (n~36 million records) that I need to randomly select two non-decedent controls for each decedent case. The matching variable is decile of air pollutant exposure, which was calculated from the full distribution of daily pollutant concentrations for all zip codes occurring in the dataset, across the 2-year study period. I've tried to approach this with 3 files: File 1 - Decedent cases (n=2.8 million): ID, Date (DeathDt), Decile (on DeathDt) File 2 - Bridge file: Date, Zipcode, Decile [file contains all possible combinations of date and zipcode in the non-decedent file, with decile assigned based on pollutant concentration on that date for that zipcode] File 3 - Non-decedent controls (n=34.6 million): ID, Zipcode The files look like this: File 1: ID Date Decile 1 1/1/2017 1 22 1/1/2017 1 41 1/1/2017 1 56 1/1/2017 2 79 1/1/2017 2 85 1/1/2017 2 100 1/2/2017 1 118 1/2/2017 1 125 1/2/2017 2 167 1/2/2017 2 178 1/2/2017 3 File 2: Date Zipcode Decile 1/1/2017 12832 1 1/1/2017 03349 1 1/1/2017 04001 2 1/2/2017 56723 2 1/2/2017 88123 1 1/3/2017 80010 3 1/3/2017 96224 3 File 3: ID Zipcode 2 88123 3 12345 4 03304 5 03867 6 04945 7 04001 8 98765 9 98801 10 96224 11 00001 12 83356 Again, for each decedent case, I need to randomly select 2 non-decedent controls who were alive as of the case's death date, but this can't be done directly because there is no date in the non-decedent file. I created a cartesian product joining Files 1 and 2, which produced a file named 'combined', and gave me the full range of zipcodes that are in the same decile as the case, on the case's date of death (code below). My plan was to join this file with the non-decedents file by zipcode, and then select 2 controls randomly. But at this point, the files were huge, and I ran out of space...and I don't know really how to do the random selection from here. I'm thinking there has to be a more efficient way to do this. Sorry for the complicated description. Would appreciate any advice for how to do this.. proc sql; create table combined as select f1.decile as CaseDecile, f1.date as CaseDate f2.decile as BridgeDecile, f2.date as BridgeDate, f2.zipcode as BridgeZipcode from decedents as f1, bridge as f2 where f1.decile = f2.decile AND f1.date = f2.date; quit;
... View more
I am having trouble ranking a variable within groups. Here is my code: proc rank data=MONTHRANK out=FLAGRANK ties=dense; /* Ranking UseFlag within each month with TIES=DENSE */
by ID TYPE descending MonthGroup NumUseFlag;
var NumUseFlag;
ranks FlagRank;
run; Here is the initial table data MONTHRANK;
length ID $12 Type $3 AdjustmentCode MonthGroup MonthRank NumUseFlag 8
;
infile datalines dsd truncover;
input ID -- NumUseFlag;
datalines;
123AAABBBC,001,,14,1,1
123AAABBBC,001,3,13,1,0
123AAABBBC,001,2,13,2,1
123AAABBBC,001,,13,3,1
123AAABBBC,001,26,12,1,1
123AAABBBC,001,,12,2,1
123AAABBBC,001,,11,1,1
123AAABBBC,001,,10,1,1
; Here is what I want to output data MONTHRANK_OUTPUT;
length ID $12 Type $3 AdjustmentCode MonthGroup MonthRank NumUseFlag FlagRank 8
;
infile datalines dsd truncover;
input ID -- FlagRank;
datalines;
123AAABBBC,001,,14,1,1,1
123AAABBBC,001,3,13,1,0,1
123AAABBBC,001,2,13,2,1,1
123AAABBBC,001,,13,3,1,2
123AAABBBC,001,26,12,1,1,1
123AAABBBC,001,,12,2,1,2
123AAABBBC,001,,11,1,1,1
123AAABBBC,001,,10,1,1,1
; Sorry if the code isn't exactly correct, I hope this is clear enough. The error I am getting is "ERROR: Data set WORK.MONTHRANK is not sorted in ascending sequence. The current BY group has NumUseFlag = 1 and the next BY group has NumUseFlag = 0.", which kind of makes sense to me if there were another adjustment line above the line in the 13th month with the NumUseFlag = 1 instead of 0 and the AdjustmentCode = 26 (this is possible within the rest of the data), but I don't really see why that should necessarily matter and I'm unsure of how to get around the issue. I have tried using descending, but I get the same error just saying it's not sorted in descending sequence. I have also tried using CLASS, but I don't think it works with PROC RANK.
... View more
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Nominations are in, and the SAS Customer Recognition Awards are open for your votes! Winners get a full trip to SAS Innovate (May 6-9) in Orlando, FL! See the 60+ inspiring entries from SAS users!