About UniversitySas

Kurt_Bremser · ‎09-22-2020

Maxim 2: Read the Log. The SQL procedure notifies you that it has to do cartesian joins, which is often a bad sign, performancewise. Split your join into two, and concatenate the results: proc sql; create table merge_test as select l.*, r.* from Id_table as l left join merge_table as r on l.Name = r.Comp_Name where not missing(r.comp_name) union select l.*, r.* from Id_table as l left join merge_table as r on l.Parent_name = r.Comp_Name where not missing(r.comp_name) ; quit;

Shmuel · ‎09-07-2020

Function COMPBL compress the text to a single space between words Function UPCASE eliminates case sensitivity ("A" in not equal to 'a'). Function Compress with delimiters '()' - i/e/ both parenthesis - will remove them from the text. The only issue that I did not deal with is row # 5 - order of sub-strings (a.o. smith).

UniversitySas · ‎08-28-2020

You are correct. My solution was to separate the "rich" group into a separate table, then using left join in proc sql and using count(b.name) to find the number of people that meet the condition that a.poor ge b.rich , grouping by gender. This gave me the number of people 'rich' below each 'poor', by gender, and then I manually computed a percentile by dividing by the total number of rich. Seemed like the easiest way to go about it.

ballardw · ‎05-08-2020

I think that you need to go into a little bit of detail as to how you arrived at those "percentile" rough estimates. Since every one of the records you show has exactly one id and date if the percentile is supposed to be by Id and Date then the single value shown for Var1 is any percentile. You mention in your subject line by "time" but your data doesn't show a "time" value but a date. So I have to assume date is the time. But you do not describe what role the ID has in the process. If none, then do not include the ID at all. If the process is supposed to use the ID then tell us how. Percentiles are order statistics and result in indicating a value that exists in the data (or with typical tie breaking rules the average between two values . Typically if you have values like: 1 2 3 4 then the 50th percentile [middle] falls between 2 and 3 so 2.5 might be reported, or 2 or 3 if using lower/upper rules). So none of the "percentiles" you show are typical for standard definitions of percentile. If you meant "percentage" then we would need to know what goes into the numerator and denominator and that is not at all obvious.

FreelanceReinh · ‎05-03-2020

You're welcome. Of course, Ksharp's solution works as well. The OUTPUT statement writes the current observation to dataset WANT (and it overrides the implied OUTPUT statement at the end of each DATA step iteration). It is executed unconditionally for every observation of dataset HAVE after second_condition was set to 0, if applicable, but before first_condition is set to 1, if applicable. Since first_condition (and likewise second_condition) retains its value within the DO-UNTIL loop which is processing one BY group after the other (known as "DOW loop"), the assignment first_condition=1 affects the OUTPUT statement of the next and all subsequent iterations of the DO-UNTIL loop within the same BY group, hence all observations in dataset WANT after the one with base_var=1 until the end of the BY group. The two assignment statements at the beginning of the DATA step initialize the two "condition" variables before the processing of each BY group commences.

ed_sas_member · ‎05-03-2020

Hi @UniversitySas You can definitely do a do-loop that varies in the last iteration, for each id, according to a variable. Please have a look at the following result -> I have not dropped i so that you can see the number of iterations ->ex, for ID=01, numobs = 5, i looped from 1 to 5. When i=6, the do loop stops as 6 > numobs. data want; set temp; do i = 1 to num_obs; end; run; proc print; Regarding your second question, you cannot write this %DO i = 1 %TO numobs; -> numobs should be a macrovariable. Could you please give some more details about what you're going to achieve? NB: another thing is that using the LAG function in an IF statement is not recommended at all, leading to unexpected results. You should fix the lagged values in a variables before using them in a conditional statement. Best,

ballardw · ‎04-22-2020

You may want to reconsider the whole "add any suffix" idea to begin with. When you leave variables with a numeric suffix then you can use all of the LIST shortcuts such as var1 - var3. As soon as you add a suffix then you will always have to list var1_usd var2_usd var3_usd. I might suggest that you name them Usd_1 Usd_2 and Usd_3. Use a LABEL to describe them back to something else.

PaigeMiller · ‎04-15-2020

@UniversitySas wrote: So let's say I have a data-set with firms of all different sizes. Now for example, suppose I want to run 2 separate regressions based on the size of the firms. One regression for firm size lower than the median size, and for firms above median size. In my case, I created a variable that is equal to "1" or "." (missing) for <median and "1" or "." for >median size for every firm. Then I multiplied these variables by firm returns. My logic here was that I will have a set of returns for firms <median size and set of returns for firms >median size. Call these two return variables Less_median_returns and greater_med_returns. Then, if a firm is <median size, it's return value in in "greater_med_returns" will just be ".", missing, right? So my question is, would running these two proc reg's give me what I'm after? PROC REG DATA = have; small: MODEL Less_median_returns = variable1; large: MODEL greater_med_returns = variable1; QUIT; Would this regression give me the same results as if I just created two separate tables based on my dummy variable? E.g., DATA small_firms; SET HAVE; if firm_Size le median; RUN; DATA large_firms; SET HAVE; if firm_Size > median; RUN; PROC REG DATA = small_firms; model returns = var1; RUN; PROC REG DATA = large_firms; model returns = var1; RUN; would the outputs be the same in both approaches? You can just run the code and find out if the outputs are the same.

UniversitySas · ‎04-15-2020

Thank you sir

UniversitySas · ‎04-15-2020

Thank you sir

UniversitySas · ‎04-15-2020

I already accepted an answer, but I realized later that what you're saying was the actual cause of the error. Thank you!

Astounding · ‎04-14-2020

It should be straightforward: proc summary data=have; var first1-first10 second1-second10; output out=want mean=mean_first1-mean_first10 mean_second1-mean_second10; run; This is untested code, so see if it works for your data.

Jagadishkatam · ‎04-14-2020

yes

Tom · ‎04-13-2020

Your macro as written doesn't need any branching. Just use the value of the macro variable at the appropriate place in the code. %MACRO create_Table(y,statistic); %local i; %DO i = 1 %TO 10; * Use PROC MEANS to generate requested statistic ; PROC MEANS DATA = table_&y&i &statistic noprint; BY year; VAR var_&y&i ; OUTPUT OUT = stat_&y&i &statistic(var_&y&i) = var_&y&i ; RUN; *Insert more code that works fine; %END %MEND create_Table; You can even get rid of the %DO loop. Put all 10 of your VAR_&Y.1 to 10 variables into one input dataset and generate the statistic for all of them in one PROC into one output dataset. %MACRO create_Table(y,statistic); * Use PROC MEANS to generate requested statistics ; PROC MEANS DATA = table_&y &statistic noprint; BY year; VAR var_&y.1 - var_&y.10 ; OUTPUT OUT = stat_&y &statistic=; RUN; *Insert more code that works fine; %MEND create_Table;

UniversitySas · ‎04-12-2020

thank you, this works perfectly.

Online Status	Offline
Date Last Visited	‎10-16-2020 10:11 PM

Is there a faster way to join using an 'or' statement?

Re: PROC SQL Joining on Substrings?

PROC SQL Joining on Substrings?

Re: Calculating percentile for a variable but based on another variabl...

Re: Calculating percentile for a variable but based on another variabl...

Calculating percentile for a variable but based on another variable?

Is there a way to get a percentile as a column, based on by time?

Re: Set variable value = TRUE/FALSE for all occurrences, once the firs...

Set variable value = TRUE/FALSE for all occurrences, once the first on...

Do LOOP based on the value of a variable?

Re: PROC SQL Joining on Substrings?

Re: Set variable value = TRUE/FALSE for all occurrences, once the firs...

Re: There is no matching %IF statement for the %ELSE

Re: Renaming an entry by trimming the string?

Re: Renaming an entry by trimming the string?

How to multiply across rows?

Re: Calculating percentile for a variable but based on another variabl...

Re: Setting value to 0 if missing - a way to check for all variables?

Re: Is there a faster way to join using an 'or' statement?

Re: PROC SQL Joining on Substrings?

Re: Calculating percentile for a variable but based on another variabl...

Re: Is there a way to get a percentile as a column, based on by time?

Re: Set variable value = TRUE/FALSE for all occurrences, once the firs...

Re: Do LOOP based on the value of a variable?

Re: Can I create an add a suffix to a pre-defined array?

Re: What is SAS's treatment of a missing value when we run regressions...

Re: Proc means output to include t value?

Re: Proc means output to include t value?

Re: There is no matching %IF statement for the %ELSE

Re: Custom prefixes in proc means?

Re: proc reg - "variable is uninitialized" error when trying to only k...

Re: Format for a macro with multiple parmaters

Re: How to multiply across rows?