About PhilG

PhilG · ‎02-28-2018

Unfortunately, no. I setup this little test data for demonstration purposes, to show the process I'm attempting. The actual dataset has about 1300 columns and millions of rows.

PhilG · ‎02-27-2018

Hello! I'm trying to carry non-missing values down a by group for multiple numeric and character columns at the same time, but I'm having trouble. It is carrying forward, but it is applying the same value from the first column to all of the columns. I have a feeling that I'm miss-using the retain command on the temp variable. So I guess the question is: sIs there a way I can update the temp in these do loops so it fills values only from the appropriate column? Here's the syntax I've used: data have; input id group rate rate2 name $ number location $; datalines; 1 1 5 6 Rick . USA 1 1 . . . 2 . 1 1 2 3 Stan . UK 1 2 . . . . . 1 2 . . . 4 . 1 2 . . Sarah . Spain 1 2 . . . . . 1 2 . . . . . 1 3 . . . Bf . Poland 1 3 3 . . . . 1 3 . . . . . data get; set have; by group; array Nums[*] _numeric_; array Chars[*] _character_; do i= 1 to dim(nums); retain temp; if first.group then temp=.; if nums[i] ne . then temp=nums[i]; else if nums[i]=. then nums[i]=temp; end; do i= 1 to dim(Chars); retain temp; if first.group then temp=.; if Chars[i] ne . then temp=NUMs[i]; else if Chars[i]=. then Chars[i]=temp; end; run; What I get with the syntax: 1 1 5 6 Rick . USA 1 1 5 5 5 2 5 1 1 2 3 Stan 5 UK 1 2 5 5 5 5 5 1 2 5 5 5 4 5 1 2 5 5 Sarah 5 Spain 1 2 5 5 5 5 5 1 2 5 5 5 5 5 1 3 5 5 5 Bf 5 Poland 1 3 3 5 5 5 5 1 3 5 5 5 5 5 Want: 1 1 5 6 Rick . USA 1 1 5 6 Rick 2 USA 1 1 2 3 Stan 2 UK 1 2 . . . . UK 1 2 . . . 4 UK 1 2 . . Sarah 4 Spain 1 2 . . Sarah 4 Spain 1 2 . . Sarah 4 Spain 1 3 . . Bf . Poland 1 3 3 . Bf . Poland 1 3 3 . Bf . Poland As always, I appreciate any assistance! Cheers! P

PhilG · ‎06-12-2017

Would it be too bother some to highlight specific areas of the syntax, so I gain a deeper understanding of the process you've created here?

PhilG · ‎06-12-2017

I'm not sure why, but the actual syntax solution that was in the email I recieved does not appear in the message-board thingy. Any road, the solution that you provided worked brilliantly! I was able to modify it such that it worked on a different dataset. So, again, thank you! Well done! This is brilliant! I can't thank you enough. If you can get to Oregon, you are invited to the whole hog BBQ I'm hosting in the backyard on the 24th! It would be a pleasure. Cheers! P PS> I'm sure I'm going to run into further question, so thank you in advance for your patience.

PhilG · ‎06-09-2017

I appreciate your expertise! Thank you. I'll give both of these a shot and see if I can't figure it out. Would it be possible to "pick your brain a little" offline, so to speak? About the solutions y'all provideed? Email of course? Cheers! P

PhilG · ‎06-09-2017

Oops. This is not the actual data I'm using. Just for illustrative purposes. I can't post my actual data. It is sensitive.

PhilG · ‎06-09-2017

Fair enough! Thank you very for your assistance! I give this a shot and see if I can't disect what you've done here. Cheers! P

PhilG · ‎06-09-2017

Rank1 then rank2 then rank3. etc. I want the location name in rank1 to pair with a location name in the location column. If the location name in rank1 cannot be found in the location column (because it has been previously assigned), then I want it to pair based on rank2 and on down the line. Again, I don't even know if this is possible....

PhilG · ‎06-09-2017

Didn't mean to do that :). Again, it is the sql commands that I'm trying to learn. I've been quite successful using other "procs" in macros.

PhilG · ‎06-09-2017

Really trying to do that: Youth dataset Y_ID age sex rank1 rank2 rank3 rank4 alternative services 122 12 m B C 15 14 m B C 5 15 m B D C 666 16 f C D 561 16 m D B C 8 21 m A D C B 46 14 m D C B 5555 17 m A B D C 8484 13 f D C 48 5 m C Location Dataset: Bed_ID Location 1 A 2 B 3 C 4 D 5 D 6 D 7 A 8 C 9 B 10 D proposed merge would look like: Y_ID age sex rank1 rank2 rank3 rank4 alternative services location Bed_ID 122 12 m B C B 2 15 14 m B C B 9 5 15 m B D C D 4 666 16 f C D C 3 561 16 m D B C D 5 8 21 m A D C B A 1 46 14 m D C B D 6 5555 17 m A B D C A 7 8484 13 f B A C D C 8 48 5 m C B A D D 10

PhilG · ‎06-09-2017

Good afternoon and happy Friday, folks I’m trying to automate a placement simulation of youth into residential treatment where they will have the highest likelihood of success. Success is operationalized as “not recidivating” within 3 years of entering treatment. Equations predicting recidivism have been generated for each location, and the equations have been applied to each individual in the scenario (based on youth characteristics like risk, age, etc., LOS). Each youth has predicted success rates for every location, which throws in a wrench: youth are not qualified for all of the treatment facilities for which they have predicted success rates. Indeed, treatment locations have differing, yet overlapping qualifications. Let’s take a made-up example. Johnny (ID # 5, below) is a 15-year-old boy with drug charges. He could have “predicted success rates” of 91% for location A, 88% for location B, 50% for location C, and 75% for location D. Johnny is most likely to be successful (i.e., not recidivate within three years of entering treatment) if he is treated at location A; unfortunately, location A only accepts youth who are 17 years old or older; therefore, Johnny would not qualify for treatment here. Alternatively, for Johnny, location B is the next best location. Let us assume that Johnny is qualified for location B, but that all of location-B beds are filled; so, we must now look to location D, as it is now Johnny’s “best available” option at 75%. The score so far: We are matching youth to available beds in location for which they qualify and might enjoy the greatest likelihood of success. Unfortunately, each location only has a certain number of available beds, and the number of available beds different across locations. The qualifications of entry into treatment facilities differ, yet overlap (e.g., 12-17 year-olds vs 14-20 year-olds). In order to simulate what placement decisions might look like based on success rates, I went through the scenario describe above for over 400 youth, by hand, in excel. It took me about a week. I’d like to use PROC SQL imbedded in a SAS MACRO to automate these placement scenarios with the ultimate goals of a) obtain the ability to bootstrap iterations in order to examine effect sizes across distributions, b) save time, and c) prevent further brain damage from banging my head again desk and wall in frustration whilst doing this by hand. Whilst never having had the necessity—nay—the privilege of using SQL in my typical roll as a researcher, I believe that this time has now come to pass and I’m excited about it! Honestly. I believe it has the capacity I’m looking for. Unfortunately, it is beating the devil out of me! Here’s what I’ve got cookin’ so far: I want to create and automate the placement simulation with the clever use of merging/joining/switching/or something like that. I have two datasets (tables). The first dataset contains all of the youth information (one row per youth; several columns with demographics, location ranks, which correspond to the predicted success rates). The order of rows in the youth dataset (was/will be randomly generated (to simulate the randomness with which youth enter the system and are subsequently place into treatment). Note that I will be “cleaning” the youth dataset prior to merging such that rank-column cells will only be populated for programs for which a respective youth qualifies. This should take the “does the youth even qualify for the program” problem out of the equation. However, it still leaves the issue of availability left to be contended with in the scenario. The second dataset containing the treatment facility beds, with each row corresponding to an available bed in one of the treatment location; two columns contain bed numbers and location names. Each bed (row) has only one location cell populated, but locations will populate several cells. Thus, in descending order, I want to merge each youth row with the available bed that represents his/her best chance of success, and so the merge/join/switch/thing should take place on youth.Rank1= distinct TF.Location, and if youth.Rank1≠ TF.location then merge on youth.Rank2= TF.location, if youth.Rank2≠ TF.location then merge at youth.Rank3 = TF.location, etc. Put plainly: “Merge on rank1 unless rank1 location is no longer available, then merge on rank2, unless rank2 location is no longer available, and on down the line, etc., etc., until all option are exhausted and foster care (i.e., alternative services). Is the only option. I’ve had no success getting this to work. I haven’t even been successful getting the union function to work. About the only successful thing I’ve done in SQL so far is create a view of a single dataset. It’s pretty sad. I’ve been following this guidance, but I get hung up around the “where” command: proc sql; /Calls the SQL procedure*/; create table x as /*Tells SAS to create a table called x*/ select /*Specifies the column(s) to be selected*/ from /*Specificies the tables(s) (data sets) to be queried*/ where /*Subjests the data based on a condition*/ group by /*Classifies the data into groups based on the specified column(s)*/ order by /*Sorts the resulting rows observations) by the specified column(s)*/ ; quit; /*Ends the proc sql procedure*/ Frankly, I’m stuck and I could use some advice. This greenhorn in me is in way over his head. I appreciate any help or guidance anyone might lend. Cheers! P

PhilG · ‎06-09-2017

Good afternoon and happy Friday, folks I’m trying to automate a placement simulation of youth into residential treatment where they will have the highest likelihood of success. Success is operationalized as “not recidivating” within 3 years of entering treatment. Equations predicting recidivism have been generated for each location, and the equations have been applied to each individual in the scenario (based on youth characteristics like risk, age, etc., LOS). Each youth has predicted success rates for every location, which throws in a wrench: youth are not qualified for all of the treatment facilities for which they have predicted success rates. Indeed, treatment locations have differing, yet overlapping qualifications. Let’s take a made-up example. Johnny (ID # 5, below) is a 15-year-old boy with drug charges. He could have “predicted success rates” of 91% for location A, 88% for location B, 50% for location C, and 75% for location D. Johnny is most likely to be successful (i.e., not recidivate within three years of entering treatment) if he is treated at location A; unfortunately, location A only accepts youth who are 17 years old or older; therefore, Johnny would not qualify for treatment here. Alternatively, for Johnny, location B is the next best location. Let us assume that Johnny is qualified for location B, but that all of location-B beds are filled; so, we must now look to location D, as it is now Johnny’s “best available” option at 75%. The score so far: We are matching youth to available beds in location for which they qualify and might enjoy the greatest likelihood of success. Unfortunately, each location only has a certain number of available beds, and the number of available beds different across locations. The qualifications of entry into treatment facilities differ, yet overlap (e.g., 12-17 year-olds vs 14-20 year-olds). In order to simulate what placement decisions might look like based on success rates, I went through the scenario describe above for over 400 youth, by hand, in excel. It took me about a week. I’d like to use PROC SQL imbedded in a SAS MACRO to automate these placement scenarios with the ultimate goals of a) obtain the ability to bootstrap iterations in order to examine effect sizes across distributions, b) save time, and c) prevent further brain damage from banging my head again desk and wall in frustration whilst doing this by hand. Whilst never having had the necessity—nay—the privilege of using SQL in my typical roll as a researcher, I believe that this time has now come to pass and I’m excited about it! Honestly. I believe it has the capacity I’m looking for. Unfortunately, it is beating the devil out of me! Here’s what I’ve got cookin’ so far: I want to create and automate the placement simulation with the clever use of merging/joining/switching/or something like that. I have two datasets (tables). The first dataset contains all of the youth information (one row per youth; several columns with demographics, location ranks, which correspond to the predicted success rates). The order of rows in the youth dataset (was/will be randomly generated (to simulate the randomness with which youth enter the system and are subsequently place into treatment). Note that I will be “cleaning” the youth dataset prior to merging such that rank-column cells will only be populated for programs for which a respective youth qualifies. This should take the “does the youth even qualify for the program” problem out of the equation. However, it still leaves the issue of availability left to be contended with in the scenario. The second dataset containing the treatment facility beds, with each row corresponding to an available bed in one of the treatment location; two columns contain bed numbers and location names. Each bed (row) has only one location cell populated, but locations will populate several cells. Thus, in descending order, I want to merge each youth row with the available bed that represents his/her best chance of success, and so the merge/join/switch/thing should take place on youth.Rank1= distinct TF.Location, and if youth.Rank1≠ TF.location then merge on youth.Rank2= TF.location, if youth.Rank2≠ TF.location then merge at youth.Rank3 = TF.location, etc. Put plainly: “Merge on rank1 unless rank1 location is no longer available, then merge on rank2, unless rank2 location is no longer available, and on down the line, etc., etc., until all option are exhausted and foster care (i.e., alternative services). Is the only option. I’ve had no success getting this to work. I haven’t even been successful getting the union function to work. About the only successful thing I’ve done in SQL so far is create a view of a single dataset. It’s pretty sad. I’ve been following this guidance, but I get hung up around the “where” command: proc sql; /Calls the SQL procedure*/; create table x as /*Tells SAS to create a table called x*/ select /*Specifies the column(s) to be selected*/ from /*Specificies the tables(s) (data sets) to be queried*/ where /*Subjests the data based on a condition*/ group by /*Classifies the data into groups based on the specified column(s)*/ order by /*Sorts the resulting rows observations) by the specified column(s)*/ ; quit; /*Ends the proc sql procedure*/ Frankly, I’m stuck and I could use some advice. This greenhorn in me is in way over his head. I appreciate any help or guidance anyone might lend. Cheers! P

PhilG · ‎03-22-2017

Sure thing. Basically, I want to be able to turn this: ID LOS a 10 a 20 a 30 b 20 b 20 b 20 c 20 c 10 c 50 into something like this: ID Total LOS a 60 b 60 c 80 or, even something like this: ID Mean LOS a 20 b 20 c 26.67 Does that make sense? I appreciate the follow up question. Cheers! P

PhilG · ‎03-21-2017

Good afternoon! I thought this was really cool solution--thank you for posting it. I was curious about doing something similar, but rather than populating the transposed cell with the series of values (i.e., 1,2,3, etc.), I would want to populate the cell with the sum of values (i.e., 6). Could I modify this proc transpose syntax to accomplish this? I appreciate your learned insight. Cheers! P

PhilG · ‎01-22-2014

Thank you for the reply. Unfortunately, I've tried that several times and I get this message: NOTE: N not equal across variables in data set WORK.TEMP01. This may not be appropriate. The smallest value will be used. ERROR: CORR matrix incomplete in data set WORK.TEMP01. NOTE: The SAS System stopped processing this step because of errors.

Online Status	Offline
Date Last Visited	‎03-02-2018 02:32 PM

Re: Carrying non-missing values down a by group for all numeric and ch...

Carrying non-missing values down a by group for all numeric and charac...

Re: Can I use proc sql to dynamically merge two datasets?

Re: Can I use proc sql to dynamically merge two datasets?

Re: Can I use proc sql to dynamically merge two datasets?

Re: Can I use proc sql to dynamically merge two datasets?

Re: Can I use proc sql to dynamically merge two datasets?

Re: Can I use proc sql to dynamically merge two datasets?

Re: Can I use proc sql to dynamically merge two dataset in order to s...

Re: Can I use proc sql to dynamically merge two datasets?

Re: Can I use proc sql to dynamically merge two datasets?

Re: Concatenate multiple rows into a single value

Re: Carrying non-missing values down a by group for all numeric and ch...

Carrying non-missing values down a by group for all numeric and charac...

Re: Can I use proc sql to dynamically merge two datasets?

Re: Can I use proc sql to dynamically merge two datasets?

Re: Can I use proc sql to dynamically merge two datasets?

Re: Can I use proc sql to dynamically merge two datasets?

Re: Can I use proc sql to dynamically merge two datasets?

Re: Can I use proc sql to dynamically merge two datasets?

Re: Can I use proc sql to dynamically merge two dataset in order to s...

Re: Can I use proc sql to dynamically merge two datasets?

Can I use proc sql to dynamically merge two datasets?

Can I use proc sql to dynamically merge two dataset in order to simul...

Re: Concatenate multiple rows into a single value

Re: Concatenate multiple rows into a single value

Re: Polychoric correlations and factor analysis...I'm pulling my hair ...