SAS Support Communities

dr2014 · ‎04-26-2018

Thanks for your help it worked!

dr2014 · ‎04-25-2018

Thanks for your reply @Tom Yes, there was an error in my value for the 'new_variable' in the last sample and also it doesn't make a difference if the values for new_variable is character or numeric. Let me try your code. An array could be a better solution than the macro. I will get back to the post in a bit.

dr2014 · ‎04-25-2018

Thanks for your reply @Patrick, however &og is not pa pa1 pa2 pa3 pa4 but the values in it which are many in the real dataset. Please review my reply to @novinosrin

dr2014 · ‎04-25-2018

@novinosrinThe macro was available and I wanted to make use of it. This is what I am trying to accomplish. Here are the basic statements... %let og=(5,6,7); %let tw=(9,3); data want; set have; if ( a in (&og.) or pa1 in (&og.) or pa2 in (&og.) or pa3 in (&og.) or pa4 in (&og.) ) and ( lt=1 and lp in (&tw.) ) then new_variable =1; else new_variable=0; run; I wanted to avoid using the multiple 'or' syntax hence the macro. Please let me know if there is way to avoid that and/or make use of the macro. Also I had an error in the dataset 'want' and I have copy pasted both below. The 'have' dataset have is large and there are multiple records for each id. have id pa pa1 pa2 pa3 pa4 lt lp 1 5 5 5 5 1 9 2 6 6 6 1 3 3 7 7 7 7 7 2 2 want id pa pa1 pa2 pa3 pa4 lt lp new_variable 1 5 5 5 5 1 9 1 2 6 6 6 1 3 1 3 7 7 7 7 7 2 2 0

dr2014 · ‎04-24-2018

Hi all, I am using SAS EG and a macro with 'if then else' and 'do loop' to create a 'new variable' with values of '1' or '0' based on the values of more than 1 variables (pa, pa1 - pa4, lt and lp). Below is the structure of my dataset called 'have' id pa pa1 pa2 pa3 pa4 lt lp 001 5 5 5 5 1 9 002 6 6 6 1 3 003 7 7 7 7 7 2 2 bellow is the dataset I need called 'want' id pa pa1 pa2 pa3 pa4 lt lp new_variable 001 5 5 5 5 1 9 1 002 6 6 6 1 3 0 003 7 7 7 7 7 2 2 1 %macro midds; data want; set have; if (pa in (&og.) %do i=1 %to 4; or pa&i. in (&og.) %end; and (lt='1' and (lp in (&tw.))then new_variable='1'; else new_variable='0'; run; %mend; %midds; &og is a macro for all required values in pa, pa1- pa4 &tw is a macro for all required values in lp When I run the code, I get an error which says 'no matching if then clause'. Could you please point out to me what is wrong with my code? Thanks.

dr2014 · ‎08-17-2017

@ChrisBrooks I got what I needed. I wasn't going about it the right way. To get to table_c , here is the new approach . I first compressed the perod_id and date fields to get a unique id , relabeled the variables code1 and code2 in table_a and table_b respectively as 'code' and then stacked table_a and table_b . I then used conditional logic using the unique id and its associated unique values in the code variable to get to table_c as I needed an addtional derived variabale as well. Thanks!

dr2014 · ‎08-17-2017

Hi @Reeza and @ballardw, I provided the tables as examples. The full code is here: proc sql; create table table_c as select * from table_a c full join table_b t on c.id=t.id and c.date=t.date; quit; I ran the above code and it gave me an error saying there are duplicates in 'id' column. I also realized joins do not overlay columns. I was wondering if there was way through joins to go about getting table_c with unique rows for values in code1 and code2. Hope this explains what I am trying to acheive. Thanks.

dr2014 · ‎08-17-2017

Hi, I am looking to use a full join versus a merge. Below are 2 datasets Table_a and Table_b that I want to do a full join on to get Table_c: table_a id Code1 date source 1 h d1 abc 1 h d2 abc 1 c1 d3 abc 1 c2 d4 abc table_b id Code2 date source 1 n1 D1 abc 1 n1 D4 abc 1 n2 D5 abc table_c id Code1 Code2 date source 1 h ndc1 d1 abc 1 h ndc1 d2 abc 1 c1 ndc2 d3 abc 1 c2 d4 abc would the below code help me acheive table_c? select * from table_a c full join table_b t on c.id=t.id and c.date=t.date Please let me know asap. Thanks.

dr2014 · ‎05-31-2017

@ballardw Thanks much for the solution! I realized you had provided it already and it was vey helpful.

dr2014 · ‎05-24-2017

Thanks @ballardw for the tips. Actually the array worked for me and I converted the character values to numeric after tweaking it. I gave a value of : '1' to the event1 value of interest (which is "in_target") ' 2' to those of event1 values of no interest and a period for missing and blank values. I am now concerned about calculating the average proportion for each event and then its confidence interval. @Reeza , I am looking to use the proc freq with the binomial option. At present I have transposed my dataset and it looks like this: Value '1' for variable "Event" is the event value fo interest. ID Period Event 1 period1_event1 1 1 period2_event1 . 1 period3_event1 . 1 period4_event1 . 2 period1_event1 2 2 period2_event1 1 2 period3_event1 . 2 period4_event1 . 3 period1_event1 1 3 period2_event1 1 3 period3_event1 . 3 period4_event1 . Is this how I should be executing the proc freq with the binomial option? proc freq data=have; ods output Binomial=ci; by id; tables event / binomial; run; If 'yes' then I will be getting proportion and Upper Conf Limit and Lower Conf Limit by each id. Do I then extract all these estimates for each id from output dataset 'ci' and take its average for the population? Please advice. Also, another way I apporached this is to calculate the numerator,denominator and proportion for each distint id and then calculate the average proportion for the population. This is how my output looks for the population for event 1: Table 1 ID Period1 Period2 Period3 Period4 Numerator Denominator Proportion Calculation Average_proportion ID1 in_target Missing low in_target 2 3 =0.66 (0.66+0+0.75)/3=0.47 ID2 low low Missing missing 0 2 =0 ID3 low in_target in_target in_target 3 4 =0.75 Should I calculate the confidence intervals manually using P = PERCENT/100 ; N = COUNT + LAG(COUNT) ; LB = P - ( 1.96*SQRT( P*(1-P)/N ) ) ; * reset lower bound to 0 if <0 ; IF LB < 0 THEN LB = 0 ; UB = P + ( 1.96*SQRT( P*(1-P)/N ) ) ; * reset upper bound to 1 if >1 ; IF UB > 1 Then UB = 1 ; What would be the most efficiat way to do this? Please reaply as soon as possible. Much thanks for your help.

dr2014 · ‎05-24-2017

Hi @ballardw. Thanks for your reply.Yes, I should have known to display my data in a form that can be tested. Also, I realized after that proc sql will require me to transpose the data as count will not be applicable to multiple columns. I wanted to avoid that. If possible, I would like to use the sum function. I anyways have to convert the values of period1 to period5 to numeric. I also have blanks in the data that have to be converted into missing numeric values. So on a clean slate here is my data and a data manipulation question as well. The values of interest to me are 'in_target'. The other values are 'low' and 'high' which are the 'event vaues not of interest'. 'missing' and blank values both need to be treated as 'missing'. data have; input id period1 period2 period3 period4 period5 datalines; 1 in_target 2 in_target missing low in_target 3 high high missing missing missing 4 low in_target in_target in_target in_target run; I am looking to convert the values of in_target to 1 , low and high as '0' ,missing and blank with a 'period'. I used the array array ch(*) $ period:; array nu(*) periodnew:; do i = 1 to dim(ch); if ch(i)='in_target' then do; nu(i)=1; end; else if ch(i)='low' then do; nu(i)=0; end; else if ch(i)='high' then do; nu(i)=0; end; else if ch(i)='missing' then do; nu(i)=.; end; else if ch(i)=' ' then do; nu(i)=.; end; end; run; I am getting an error stating ERROR: Array subscript out of range at line # column # Could you first please tell me what is wrong with my array code?

dr2014 · ‎05-24-2017

Hi all, I need your advice urgently to calculate the ‘confidence intervals’ of an average proportion. Please refer to example Table 1 for the original data. My final table has to look like Table 2. Table 1. displays the example of the calculation of the ‘average proportion’ of the population for ‘event 1’. The numerator is the count of ‘event1_of interest’ across the time periods (period 1 to period 4) divided by the ‘total count of the time periods’ for the id excluding the period with ‘missing value’ for the event1. After identifying the numerator and denominator for each id, I have to calculate the proportion for each id and then the ‘average proportion’ for the population which is displayed in Table 1. I have to then estimate the ‘confidence intervals’ for the ‘average proportion’ for each event for the population and create Table 2. I am looking to calculate the counts(numerator, denominator) and proportion using proc sql. How can I calculate the confidence intervals for the average proportion? Please let me know as soon as possible. Thanks. Table 1 ID Period1 Period2 Period3 Period4 Proportion of Event1_Time in target Proportion Calculation ID1 Event1_of interest Missing Event1_not of interest Event_of interest = 2 / 3 =0.66 ID2 Event1_not of interest Event1_not of interest Missing missing = 0 / 2 =0 ID3 Event1_not of interest Event1_of interest Event1_of interest Event1_of interest = 3 / 4 =0.75 Average_Proportion =(0.66+0+0.75)/3=0.47 Table 2 Average_Proportion 95% Confidence Limit Event Event 1 0.47 Event 2 Event 3 Event 4

dr2014 · ‎05-15-2017

Addtionally, when I output the chart I get a inner and outer box. Is there a way to get rid of it? Thanks.

dr2014 · ‎05-15-2017

Hi all, I am creating a stacked bar chart for percent scored on test which is categorized in difefrent groups as low , midddle and high. Below is the data and my code. The 'Score' variable is in percent. (1) I need to reduce the width of the bar and (2) also add 'Score' of the 'Category' to each segment in the bars. (3) Remove the word 'Category' on the legend. (4) adjust the percent 'interval' on the y-axis and add the symbol % next to the numbes on the grid lines. Please advise as soon as possible. Thanks. proc sgplot data=want; title 'Distribution XXXXX; vbar Test / response=value group=Category; xaxis display=(nolabel); yaxis grid label='Percent %'; run; Category Test Score low CHE 11 middle CHE 83 high CHE 3 low PHY 7 middle PHY 54 high PHY 38 low MAT 10 middle MAT 69 high MAT 19 low BIO 16 middle BIO 33 high BIO 51

dr2014 · ‎05-15-2017

Hi DanH_sas, thank you I tried it and I can understand its value. I will definitely remember to use it, but now I am required to display the chart in the previous manner. Also any help on displaying the 'N; below the time periods would be appreciated!

Online Status	Offline
Date Last Visited	‎09-28-2020 04:55 PM

SAS Support Communities

Re: How do I plot line graphs for multiple groups with gplot?

How do I plot line graphs for multiple groups with gplot?

Adding title to gplot

Re: Remove exponents from the values

Remove exponents from the values

Re: Running frequency

Re: Running frequency

Re: Running frequency

Running frequency

Re: creating unique window periods

Re: Accessing big data using SAS PC

Re: Accessing big data using SAS PC

Re: Access data in SAS EG

Re: convert numeric variable missing values to character variable miss...

Re: convert numeric variable missing values to character variable miss...

Re: convert year date value from character to numeric

Re: if then else do syntax

Re: if then else do syntax

Re: if then else do syntax

Re: if then else do syntax

if then else do syntax

Re: full join versus merge

Re: full join versus merge

full join versus merge

Re: Calculate the confidence intervals for the average proportion

Re: Calculate the confidence intervals for the average proportion

Re: Calculate the confidence intervals for the average proportion

Calculate the confidence intervals for the average proportion

Re: display value of segments and y axis grid in stacked bar chart

display value of segments and y axis grid in stacked bar chart

Re: create a clustered bar chart using multiple variables

Follow Us

What is...