About Tom

Tom

If you want the variables to have the LAST non missing value per group then use the UPDATE statement to treat the records as transactions. data want; update have(obs=0) have; by SUBJECT VISIT TIMEPOINT ; run; The UPDATE statement needs two dataset. The original data and the transaction records. The original data needs to have one observation per BY group. But you can have multiple transaction records per BY group. So just use an empty version of the data (by using the OBS=0 dataset option) as the original data and all of the observations as transactions.

Tom

Write a code to reference the folder where the excel files are stored. To get the list of files you can use the DOPEN() and DREAD() functions. Or take advantage of some existing code that does that such as https://github.com/sasutils/macros/blob/master/dirtree.sas Write a code to extract the two columns of interest from the first excel file and store into a new dataset. That sounds very strange. What do the excel files look like? Normally when you IMPORT data you import the whole worksheet. You can always ignore the other variables if you don't need them. Are they really EXCEL files? It is much easier if they are instead CSV files, which your PC might think of as being owned by Excel but are really just plain text files. Since you can write your own data step to read from a text file. If they are Excel files (XLSX files) then each file could have multiple worksheets in it. Do your files have multiple worksheets? Do the worksheets always have the same name(s)? Then reiterate this code to continue extracting from the next files until it gets to the last file. So figure out how to do one file. Then generate that same type of code for each file. This can be made easier by creating a macro that processes one file. Then use the list of files to generate one call to the macro for each file. That way if the logic for how to process a file changes you can just update the macro. Make sure to flag those files that have been imported. If you just process the files in your list there is no need to "keep track of which files have been read". But if you expect to need to re-do this process again in the future when there may be some NEW files that need to be processed then save the list into a SAS dataset. So that next time you want to look for and process new files you can exclude the files you processed already. The new dataset will now contain the following columns: ID-1, Code-1, ID-2, Code-2, ID-3, Code-3, ... Therefore, next step is to transpose. I can perhaps guess what you mean by this. For example you might mean that the spreadsheets have two variables named ID and CODE and your want to somehow convert it from that useful organizational structure of having each ID/CODE pair on its own observation into some type of wide structure with many ID variables and many CODE variables. Why would that help anything at all? I would expect it would be better to instead just add another variable (or more) that can be used to tie those observations together. So perhaps the name of the file that they came from. Or perhaps a value like a DATE or COUNTRY or DISEASE that is derived from the filename. Extract both columns of interest from each excel file into a dataset. This will create ~120 datasets*. Not really. More like it will re-create the same dataset 120 times. Each new one replacing the old one. Transpose each of the new datasets. Don't do this. Or if you need to do it wait until you have all of the data so you know how many variables your will need. I suspect each spreadsheet will have a different number of observations, hence a different number of variables when transposed. Merge all new datasets. More like keep appending each new dataset as it is created. Delete the ~120 datasets. Probably not really needed. First if you process them one by one there is only one temporary dataset to be deleted. But also SAS work datasets are already deleted as soon as your SAS session ends. So your coding process should be: Locate one or two example EXCEL files. Use PROC IMPORT to (or use LIBNAME with XLSX engine) to convert them into SAS datasets. Figure out what variables they have. Are the ID and CODE variables really in the spreadsheet? Or are the spreadsheets some complex layoff that poorly constructed for being used as a data source? Figure how to convert the INCONSISTENT mess you will get by using PROC IMPORT to GUESS what variable names, variable types and lengths for characters into a consistent structure that you can append. Once you have that mainly sorted try wrapping that into a macro that takes as input the name of the file to read and possibly other things like the sheet within the file to read and the name of the dataset to create. Test that to see if it works for the couple of files you explored already. Then try it on a couple more. Once the macro is mainly working you can add a PROC APPEND step to aggregate the new file into a consolidated file. This is where making sure the variable names and types are consistent is very important so that each new dataset created from each new file has the same variables so they can be combined. Now look into the process of getting the list of files. How automatic does it need to be? Is the list static? Then just use normal operating system commands like DIR (or ls on unix) to make a text list of names and paste it into your program. If the list will vary then look into using a tool like the %DIRTREE() macro to build a dataset with the list of filenames. Once you have the list of filenames you can use it to generate one call to the macro for each file.

Tom

When you assign a value to a macro variable that does not already exist then one is created. If you do that inside a macro then it is created as LOCAL to the macro. So once the macro finishes running it is gone. So you have two solutions. 1) Make sure the macro variable already exists before calling the macro. %macro test_mcr1 (user, secrete); %let conn_creds=user="&user" password="&secrete"; %mend; %let conn_creds=before macro call; %test_mcr1(user=FRED,secrete=MYPASS); %put &=conn_creds; 2) Make the macro smart enough to make a GLOBAL macro variable if one does not already exist. If you use this method remember to test if there is already a macro variable before trying to make a GLOBAL macro variable. That is because if there already a LOCAL macro variable (perhaps in some other currently running macro that called this one) then trying to make a GLOBAL macro variable will generate an error. %macro test_mcr1 (user, secrete); %if not %symexist(conn_creds) %then %global conn_creds; %let conn_creds=user="&user" password="&secrete"; %mend; %test_mcr1(user=FRED,secrete=MYPASS); %put &=conn_creds; But perhaps for your use case you don't really need a macro variable at all. Instead you could make a macro you can call in the middle of the LIBNAME (or PROC SQL CONNECT statement). %macro conn_creds; user="FRED" password="MYPASS" %mend conn_creds; libname libref odbc %conn_creds dsn=xyz; Make sure the macro does not emit any semicolons.

Tom

Read the file as BINARY instead of LINES of TEXT. So use RECFM=F or RECFM=N.

Tom

@freshstarter wrote: Thanks for your response. Yes, we do have LSF in our environment, but problem is that there are lot of SAS schedule flows after this particular job, Its very unlikely to run this job once ( as you said ) ,. So thats the reason we planned to write the logic at the program level and based on this output all successor job will get kick started. Not sure what that means. How would your proposed SAS DO loop work to help fix the issue? Do you mean that there is a chain of jobs (steps?) scheduled? How does that work? Do the later ones runs only when the first one completes "successfully"?

Tom

You are testing in the wrong place. Apply the test when selecting which observations to update. update CUSTOMER a set PRE_DIAGS = (......) where missing(PRE_DIAGS) ; Also which of the values of PRE_DIAGS in RETAILS do you want to use to replace the current observations value of PRE_DIAGS in CUSTOMER?? Is there some type of customer id in both datasets that could be used to know where in RETAILS to look? Could the same customer appear more than once in RETAILS? If so which value do you want to use? update CUSTOMER a set PRE_DIAGS = (select max(b.pre_diags) from RETAILS b where a.customer_id = b.customer_id) where missing(PRE_DIAGS) ;

Tom

Your macro does not have any %DO loop. So it cannot generate separate queries for each date value. If your list of date values is small you could skip the macro and just put the code into the macro variable. (has to be small since there is 64K byte limit on the length of a macro variable and the code is larger than just the date value). Then just expand the macro variable to run the code. proc sql noprint; select distinct catx('create table',cats('sample_',date) ,'as select * from cars where date=',date,';') into :to_dt separated by ' ' from test ; &to_dt quit;

Tom

I find it pretty simple to loop over a list of values a macro variable. Change your macro to expect a list instead of just one value. %macro mymacro(dtlist); %local i dt; proc sql; %do i=1 %to %sysfunc(countw(&dtlist,%str( ))); %let dt=%scan(&dtlist,&i,%str( )); create table sample_&dt. as select * from data where date = &dt. ; %end; quit; %mend mymacro; Then you can call it with one or many date strings. %mymacro(dtlist=202401) %mymacro(dtlist=202402 202403) To get a list of values into a single macro variable make sure to use the SEPARATED BY option on the SELECT statement. proc sql noprint; select distinct curr_dt into :curr_dates separated by ' ' from customer ; quit; You can then use that macro variable in your call to the macro. %mymacro(dtlist=&curr_dates)

Tom

You probably just need to remove the N= option and the INPUT will jump to the next line when there is no more data available on the current line. But that could be a problem if the last item on a line could be empty. So if you are positive that the split between the lines always appears in the same place then just add a / to the INPUT to tell it when to move to the next line. You could then add the TRUNCOVER option to prevent it from jumping if the last value on the line is empty. I usually find it clearer to define the variables first with a LENGTH statement. Then the INPUT statement can by much simpler. And if you define the variables in the order they appear in the file you can use positional variable lists to make the INPUT statement shorter. data want; infile cards dsd dlm='|' truncover ; length SocSecNum $11 FirstInit MiddleInit LastInit $2 CityState $35 ZipCd $5 Gender $6 Eth $22 Racial $16 DOB 8 ; input SocSecNum -- ZipCd / Gender -- DOB ; informat DOB date.; format DOB date9.; cards; 111-22-3333|S.|Y.|B.|Blue Mountain, Mississippi|38610 Male|Not Hispanic or Latino|African American|22JAN1974 444-55-6666|J.|G.|T.|Gulfport, Mississippi|39505 Male|Not Hispanic or Latino|Caucasian|22FEB1944 777-88-9999|R.|Y.|H.|Gulfport, Mississippi|39505 Female|Not Hispanic or Latino|Caucasian|03FEB1985 999-88-7777|D.|W.|J.|Hollandale, Mississippi|38748 Female|Not Hispanic or Latino|Caucasian|22JAN1938 ; Result

Tom

First is use one of the newer concatenation functions instead of the || operator. Check out CATS() or CATX() as the most useful. But there are others. Once you are doing that you can then use variable lists in the function call. That requires you to use the OF keyword. string=catx('|',of name -- address); string=catx('|',of _character_); string=catx('|',of _all_); Depending on the list and the dataset it might even be easier to use the DROP= dataset option to so that you only have to list the variables you don't want to include instead of the ones you do want. data want; set have(drop=ignore_me); length string $200; string = catx('|',of _all_); set have; run;

Tom

Note that the format of the strings returned by FINFO() depend on your language settings. I have found the using NLDATM informat works consistently. Also the format of the strings used to specify the information to return depends on the language setting. You can use FOPTNAME() to fix that. lastmod = input(finfo(fid,foptname(fid, 5)), nldatm100.); To see this in action check out https://github.com/sasutils/macros/blob/master/dirtree.sas Also the DATETIME format has a bug and you should not use DATETIME18. Even though that should be enough room to include 4 digit years it will not. So either use DATETIME19. (four digit years) or DATETIME16. (two digit years).

Tom

Macro quoting does not only "solve" problems. It can also "cause" problems. So use the version %QSYSFUNC() if you want macro quoting added and %SYSFUNC() if you want macro quoting removed.

Tom

Figure out what the name is. Then you can use it to generate the RENAME=() dataset option. For example say you want to combine the observations from three datasets in the WORK libref . So generate a SET statement that looks like: set a(rename=(a=column1)) b(rename=(b=column1)) c(rename=(c=column1)) ; Code: %let libref=work; %let dslist=a b c ; proc sql noprint; select cats(libname,'.',memname,'(rename=(',name,'=','column1','))') into :code separated by ' ' from dictionary.columns where libname=%upcase("&libref") and findw("&dslist",memname,,'sir') and varnum=1 and lowcase(name) ne 'column1' ; quit; data want; set &code ; run; PS Can you explain how you got into this situation? Perhaps you can fix the problem earlier in the process and avoid the need to generate RENAME= dataset options.

Tom

To get a variable with those missing values you will need an separate variable to keep the count. So use the BY statement to find the groups. The SUM statement to increment the counter and an IF statement to assign the counter to create your wanted variable with the missing values otherwise. data want; set ds1; by id stop notsorted; if first.id then counter=0; counter + (first.stop and stop); if stop then sseq=counter; run; Result

Tom

If you wanted to check for ONE group, say HBP, you would first check if they meet the inclusion criteria. Then IF they do check if the meet any of the exclusion criteria. So let's say you want to create a 0/1 flag to indicate if this observations indications HBP. So your looping might look like (in sort of pseudo code). hbp=0; do index=1 to dim(array1) while (not hbp); if array1[index] in (list_of_included_codes_for_HBP) then HBP=1; end; do index=1 to dim(array2) while (hbp); if array2[index] in (list_of_excluded_code_for_HBP) then HBP=0; end; You might be able to use hash objects to effect the tests in the IF statements. You might want to make a third (or fourth) DO loop to handle procedure codes in addition to diagnosis codes.

Online Status	Online
Date Last Visited	9 hours ago