About Keith

Keith · ‎08-23-2012

The problem is that the order of the ID variables is determined by the order they appear in the input dataset. You could solve the problem in your example data just by sorting table1 by Code and Mth, however this won't necessarily work in your real data. There is no way I know of where you can specify the order of the variables within PROC TRANSPOSE, so one of your options is to reorder the output dataset afterwards. Here's one way of doing that. data table1; input Code $ Mth Pay; datalines; ABC 1 12 DEF 2 12 XYZ 1 34 ABC 2 23 DEF 1 45 XYZ 2 65 ABC 5 6 ABC 4 12 ABC 3 34 ; run; proc sort data=table1; by code; run; Proc Transpose data=table1 out=table2 (drop=_:) prefix=Mth; by Code; id Mth; var Pay; run; proc sql noprint; select name into :vars separated by ',' from dictionary.columns where libname='WORK' and upcase(memname)='TABLE2' and upcase(name) eqt 'MTH' order by name; create table table2 as select code, &vars. from table2; quit;

Keith · ‎08-21-2012

Hi Mike, the reason for this is that array d has a fixed length of 5, so the value '12' stored in it is padded out with 3 trailing blanks.

Keith · ‎08-21-2012

Yes, the use of IN: is very useful Mike. One point to note that some readers may not be aware of is that this function truncates the longer of the 2 values being compared to the shorter, this becomes important with variable lengths of the values of interest. For example, if you changed the length of diag to 6 in your code and added a digit to 25012 in the datalines statement, it would still match with 25012 in array d because it truncates diag to 5 characters in this instance.

Keith · ‎07-25-2012

Ah yes, I'm familiar with merging using formats but I hadn't thought of using it here. I'll see if improves efficiency significantly.

Keith · ‎07-25-2012

Hi all, I have 2 datasets, a master and a lookup. The master has id, log_date and role. I want to update the role column in master from the lookup dataset which has id, date_from, date_to, group and role. One id can belong to more than one group at the same time, however I am only interested in group 1, so the data is unique when dates are taken into account. I've written a Proc Sql statement to update master, but this is taking hours to run (the master dataset has around 15 million rows, the lookup dataset around 17 thousand). I assume this is due to the where clause I've had to include. Does anyone know of a more efficient way to update for this type of query? Possibly using Hash? Sadly I'm still in the dark ages of 9.1.3, so I'm not able to use the extra Hash functionality that came in 9.2 (fuzzy matching). Below is an example of the 2 datasets I have, along with the Proc Sql code I'm currently running. data master; input id log_date :date9. role; format log_date date9.; cards; 1 25jul2012 2 2 25jul2012 2 3 25jul2012 4 4 25jul2012 1 5 25jul2012 1 6 25jul2012 1 ; run; data lookup; input id (date_from date_to) (:date9.) group role; format date: date9.; cards; 1 01jul2012 01aug2012 1 10 1 01jul2012 01aug2012 2 20 2 01jul2012 01aug2012 1 10 2 01jul2012 01aug2012 2 30 3 01jun2012 30jun2012 1 10 3 01jul2012 01aug2012 1 20 3 01jul2012 01aug2012 2 40 4 01jul2012 01aug2012 1 20 5 01jul2012 01aug2012 2 20 ; run; proc sql; update master as a set role=(select role from lookup as b where a.id=b.id and a.log_date between b.date_from and b.date_to and b.group=1) where exists (select id from lookup as c where a.id=c.id and a.log_date between c.date_from and c.date_to and c.group=1); quit;

Keith · ‎07-24-2012

There's actually a much easier way to achieve your goal, without having to write multiple IF statements. Just store the var1-varn variables in an array, loop through each one to check for the value 1, then use the VNAME function to add that variable name to the concatenated list. Your code will look something like this, I've put in 2 options for the 'multi' variable, the first stores the results as a comma separated list, the 2nd puts double quotes around each value as per your code. data checkfile; set xxx.xxx; length multi $140; array vars{*} var1--varn; do i=1 to dim(vars); if vars{i}=1 then call catx(',',multi,vname(vars{i})); /* or */ if vars{i}=1 then call catx('',multi,quote(vname(vars{i}))); end; drop i; run;

Keith · ‎07-24-2012

Then just change .* to .{1,30} in the expression, as per your original post. This will allow up to 30 characters between 'no' and 'infiltrate'

Keith · ‎07-23-2012

You can simplify your expression to prxparse('/no\b.*infiltrate/i') The "\b" defines a word boundary, ".*" then matches any number of characters before "infiltrate". You already have "/i" at the end of the expression to ignore case, therefore you don't need to specify "N|N n|n" etc.

Keith · ‎07-20-2012

Here's how I would approach this. If the final output is your goal, then you don't need the intermediate step to add the new column, you can just use formatted values of the existing column. If the order of student behaviour is important (e.g. Good, Bad, Nice) then I've added a format which stores that order (the notsorted option in the format and the order=data option in the tabulate are the key instructions to achieve this). data have; input student_id student_behaviour $ library_visits; cards; 110 good 2 111 good 15 113 bad 10 114 bad 1 115 nice 2 116 nice 24 117 good 6 119 bad 3 ; run; proc format; value visit_f low-9 = '1-9' 10-20 = '10-20' 20-high = 'over 20'; value $ beh_f (notsorted) 'good' = 'Good' 'bad' = 'Bad'' 'nice' = 'Nice'; run; proc tabulate data=have; class student_behaviour library_visits / order=data; format library_visits visit_f. student_behaviour $beh_f.; table student_behaviour='',library_visits='No. of Visits'*n='' / box='Character'; run;

Keith · ‎07-19-2012

@kevin_123, That's pretty much how I would do it. One simplification you could make is to use the CALL CATS function in place of Varname=TRIM(....) || TRIM(...). So the 2 lines would be : call cats(prod_con,prod); call cats(acc_con,acc_flag);

Keith · ‎07-19-2012

The only percentiles available in Proc Summary are 1,5,10,25,50,75,90,95,99. To get other percentiles, you need Proc Univariate, use the PCTLPTS option within the OUTPUT statement.

Keith · ‎07-18-2012

Create a picture format using the required directives to produce the result you want. You can then either format the existing column, or create a character string using the PUT statement. proc format; picture dtfmt low-high = '%Y%0m%0d_%0H%0M' (datatype=datetime); run; data want; a='18jul2012:08:50'dt; format a dtfmt.; b=put(a,dtfmt.); run;

Keith · ‎07-17-2012

You are using an informat, not a format. The SAS documentation on the w.d informat provides the explanation, here is the link. http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000199348.htm

Keith · ‎07-17-2012

Not in TTEST. I tend to use PROC UNIVARIATE to create histograms, so I guess you can create an output dataset in TTEST and then use UNIVARIATE to produce the histogram.

Keith · ‎07-17-2012

What version of SAS are you using, because the PLOTS statement for PROC TTEST only became available in 9.2. I have 9.1.3 and am getting the same error message as you, so I suspect that is the cause.

Online Status	Offline
Date Last Visited	‎09-11-2015 12:18 PM

Re: Regarding If and then statement

Re: Merging 3 datasets

Re: Regading if and then

Re: Retaining Values in a Do Loop

Re: Using column wildcard (:) with retain statement in a datastep

Re: Select observations that meet a criteria

Re: Select observations that meet a criteria

Re: How to read Input data

Re: Output next,or next and next,or next and next and next,hahaha

Re: Re-Arrange Variables

Re: Find the missing values in the year

Re: Increment row based on change in record

Re: Dealing with 32 character restrictions

Re: calculating averages on floating period

Need help in matching the obs from dataset to excel

Re: How to change the name and format of PROC IMPORT variables

Re: Proc Transpose - order ID in proper sequence

Re: de-dup array

Re: de-dup array

Re: Proc Sql optimisation - Hash?

Proc Sql optimisation - Hash?

Re: Conditionally adding a string to a character variable

Re: reg expressions part II-controlling the "no" in the expression

Re: reg expressions part II-controlling the "no" in the expression

Re: Help with grouping and tabulating

Re: concatenate through group by using SAS or PROC SQL, urgent help ne...

Re: 20th percentile in PROC SUMMARY?

Re: CCYYMMDD_HHMM string

Re: Why the x and xy value is with 3 decimals even after using format ...

Re: proc ttest

Re: proc ttest