About Wolverine

Wolverine · ‎03-06-2023

That works, thanks!

Tom · ‎02-20-2023

@Wolverine wrote: @Tom wrote:Remember that the format attached to a variable just determines how the value is DISPLAYED, not what it stored. Excellent point -- that is the source of my confusion! The variables are displayed as 02/20/2022, but that is not how SAS is actually storing them. Right. SAS uses the term as defined by meaning 6a in the dictionary.com definition. Although SAS does use the term informat for the rules used for INPUT and reserves the term format for OUTPUT.

Tom · ‎12-09-2022

The other way is to convert your coverage into intervals. I am sure this has been asked before here how to collapse overlapping of contiguous intervals. Or you could just use the array approach implied by the string approach and then roll it out as a series of intervals. Then you just need to join by the memberid and whether the index date falls into the coverage interval. proc sql; create table want as select a.* , b.start , b.end , intnx('month',b.start,a.index) as months_pre , intnx('month',a.index,b.end) as months_post from service a left join coverge b on a.memberid = b.memberid and a.index between b.start and b.end ; quit;

Wolverine · ‎10-25-2022

@Quentin wrote: I like the idea of coding up the same algorithm using multiple approaches to compare efficiency. That said, I think you should work to get all the approaches to match in their output. This should not happen. If SAS runs out of memory, you should get an error in the log. You definitely should not get the wrong result (with no error). if you really have a case where SQL is giving you the wrong result, I would send it in to tech support. Same for your statement that hash approach had some discrepancies. I think it's likely that there are some edge cases in your data that are falling through some cracks in your code. But if you have a repeatable example of discrepancy (especially one where the results of the code vary with the amount of memory available to the SAS session), please send it in to tech support. Also confused by your statement that the output dataset from the hash approach was double the size of other approaches. If the output datasets from each approach are identical (e.g. judged via PROC COMPARE to compare the metadata and data), this shouldn't happen. Unless maybe you changed compression options. Before this project, I had virtually no experience w/ arrays or hash objects. I don't know why the flag counts were slightly different. The only clue I had that RAM could be an issue was that Firefox crashed and presented a dialog box indicating that the crash was due to insufficient memory. When I manually reviewed the discrepant cases, the codes that matched for those cases had been successfully matched on many other cases. And after increasing the memory, the discrepancies disappeared. It wasn't just the output filesize that was different among the various approaches, it was also the number of records they contained. So data compression wouldn't explain the differences. Perhaps Proc SQL does not continue to search for matches on a given flag after it has already matched that flag, whereas the hash version DOES continue to search? In that case, there could be duplicate records for a given case that match on the same flag, and that could explain the differences in filesize. I could run Proc SQL w/ select distinct on the hash output file to see if it finds and eliminates any duplicate records. I will review all of this in an attempt to make sure I haven't made any errors. If I can't find any, I'll submit it to tech support. However, I'm getting busy with other projects, so I don't have as much time to dedicate to this right now. So it may take a while😐

Wolverine · ‎08-24-2022

This works and provides the same exact frequencies as the original programming. Thanks!

ballardw · ‎04-21-2022

When there is no apparent natural order a tried-but-true method is to add a numeric order variable to the data, sort by that and use a procedure that will keep that order for display, such as Proc Print or possibly Report and Tabulate with the Order=data option in the right place. One way that MAY work if you have a list in the correct order could involve using the FINDW function with your list of variable names and options to return word position instead of character position. Or If/then/else statements in a data step using the names of the variables for each table.

Tom · ‎03-30-2022

Use arrays. %let EPL_list = ("O021","O03"); %let excl_list_pm70154d = ("59820","59812","59821","0UDB"); data temp.Index_cases; set comb.med_comm_2018_2021; array dx PRINCIPAL_DIAGNOSIS_CODE DIAGNOSIS_CODE_1-DIAGNOSIS_CODE_36; do index=1 to dim(dx) until(EPL_flag); EPL_flag= dx[index] in: &epl_list. ; end; array px PROCEDURE_CODE ICD_PROCEDURE_CODE_1-ICD_PROCEDURE_CODE_25; do index=1 to dim(px) until(excl_list_pm70154d); excl_list_pm70154d = px[index] in: &excl_list_pm70154d.; end; run;

Kurt_Bremser · ‎11-16-2021

APPEND must be used like other options with an equal sign: options append=(fmtsearch=base);

ballardw · ‎11-05-2021

Hint: Provide some example data in the form of a data step that behaves like yours as far as mixes of values and all the rules about what is kept. The data step so we don't have to guess about variable types. Then work through that (should be a small enough set to do manually) and show use the desired result. Note: multiple similar value like your DX variables and applying similar rules to each variable typically points to an ARRAY based solution which means Data step as SQL doesn't provide any short cut ways to handle list of variables. Turning the Excel lists into Formats would be one way approach something like this. data have; input HEALTHCARE_ENCOUNTER_ID $ icd_dx1 icd_dx2 icd_dx3 icd_dx4 icd_dx5; datalines; 12345 9993 2255 3366 4477 5588 23456 2358 5899 8881 7777 8850 34567 8546 5468 4811 8883 9991 45678 8326 . . . . 0 0 ; proc format library=work; value iscancer 9991, 9992, 9993='Is Cancer' other= 'Not Cancer' ; value ispregnant 8881, 8882,8883 = 'Is Pregnant' other= 'Not pregnant' ; run; data want; set have; array dx (*) Icd_dx: ; do i=1 to dim(dx); IsCancer = max(IsCancer, put(dx[i],iscancer.)='Is Cancer'); IsPregnant = max(IsPregnant, put(dx[i],IsPregnant.)='Is Pregnant'); end; drop i; run; Datasets can create formats if the list of values and/or number of lists is "large" and don't want to type out all the formats. The biggest trick involved once you have the formats is the Max() statement. SAS will return 1 for true or 0 for false for the "put(variable,format.)="some text") result. The max means that largest value created gets kept as the variables get parsed. Note that the only thing need to process more variables is to have them on the ARRAY statement. So if you have a hundred variables named with the Icd_dx convention they all get processed with the Icd_dx: list. All variable starting with Icd_dx would be included. Or if you want a subset of them Icd_dx1 - Icd_dx30 for example. These variable lists coupled with the Array are why I would not attempt to use SQL. If you place the format into a permanent library, possibly where you are working with this data and that that library to the FMTSEARCH path then the formats are permanent and you don't need to run the code every time to use them.

japelin · ‎02-18-2021

like this? If possible, I would like to see the actual (even dummy) data regarding the zip code.

Reeza · ‎11-20-2020

Just a note, that my code is not a macro.

StatDave · ‎03-05-2020

Your "rate" is the probability (numerator/denominator). As such, each year represents a set of binary responses - each yielding the condition or not. This is what can be modeled by logistic regression. If you want to compare the years and check for any differences, use the LSMEANS statement and the DIFF option. To show only the significant differences, you can fit and store the logistic model and then do the LSMEANS analysis in PROC PLM and use its FILTER statement to show the significant differences as below. With these data, there is a significant difference (p=.0275) among the year probabilities. The filtered results show where the differences are (assuming 0.05 level). data a; input Year Num Den; datalines; 2005 114 251 2006 101 245 2007 113 243 2008 116 252 2009 107 272 2010 134 366 2011 112 319 2012 141 331 ; proc logistic data=a; class year/param=glm; model num/den=year; lsmeans year / ilink; store log; run; proc plm restore=log; lsmeans year/ilink diff; filter probz<.05; run;

Kurt_Bremser · ‎10-24-2019

PS I do have a similar issue here. The keys in our database are UUIDs, so they are stored in SAS as 16-byte character, with a $hex32. format attached. I originally built two macros, one that converts the 36-character human-readable original string we get from DB/2 to this format, and another that recreates the 36-character string from the compressed 16-byte value. Since SAS has introduced the $UUID formats and informats in the meantime (the informat seems to be available from 9.4M6 on), these can be greatly simplified. When I have to send data to non-SAS users, I always do the conversion to human-readable form and send the longer strings.

lizzdream12 · ‎04-28-2019

Hi Ballardw, I have a question that hope you can help with. Context: The data contains four variables: id, A1C value, date (in months) of A1C measurement, and A1C count. One id has >=3 A1C values. I sorted the data by id and descending date so that the most recent A1C comes up first. I am trying to keep the most recent 3 A1C values that are at least measured 3 months apart for a specific id. So for example, if a patient has 4 A1C values, but the 3rd one is measured within 3 months from the previous A1C (The 2nd one), then it should be deleted or kept separately. and the 4th A1C's gap from the 2nd one need to be checked and see if it is >=3 months. If yes, then the 1st, 2nd, and 4th A1C value will be kept for that id for further calculation. Each id can have varied counts of A1C values. How can I remove the observations not meeting the criteria and calculate the gap for the next A1C date? Thanks in advance!

Rick_SAS · ‎04-24-2019

There are several SAS formats that show dates. My favorite is DATA9., but here is a link to some others. Just use the FORMAT statement to assign a format to a variable: data A; format IMGDate DATE9.; IMGDate = 21164; run; proc print; run;

Online Status	Offline
Date Last Visited	a week ago

Re: Macro's don't "reset" after pressing Cancel button

Re: Macro's don't "reset" after pressing Cancel button

Re: Macro's don't "reset" after pressing Cancel button

Macro's don't "reset" after pressing Cancel button

How to check if a macro variable has a specified value

Re: Concatenating means and standard deviations with trailing zeros wh...

Concatenating means and standard deviations with trailing zeros when n...

Re: PROC SQL match missing about 3% of cases even though they are in b...

Re: PROC SQL match missing about 3% of cases even though they are in b...

Re: PROC SQL match missing about 3% of cases even though they are in b...

Re: How to check if a macro variable has a specified value

Re: Do loop iterations based on the value of a variable, with 0-paddin...

Re: Using TableN macro, having trouble with "COLBY" column by option

Re: Searching through multiple variables with array not giving same re...

Re: Continuous enrollment across years following a birth event pt2

Re: PROC SQL match missing about 3% of cases even though they are in b...

Re: Logic problem with IF-THEN?

Why doesn't this RETAIN statement work??

Re: Using dm statement to save log, how to replace old log automatical...

Re: Basic data quality check -- how to determine percent missing for k...

Re: Using dm statement to save log, how to replace old log automatical...

Re: Restricting to a range of dates in MM/DD/YYY format

Re: Continuous enrollment 3 months prior to and 2 months after index m...

Re: SAS generating a large file, but barely using any computer resourc...

Re: Medical data: matching diagnosis codes with multiple variables/cod...

Re: Exporting Proc ttest to Excel

Re: Searching for lists of values across multiple variables with the s...

Re: Previously working syntax can no longer load formats file

Re: Matching a range of variables to multiple lists of desired values

Re: How to flatten a file with multiple variables to flatten and multi...

Re: Basic data quality check -- how to determine percent missing for k...

Re: Proc GLM and weights

Re: Convert character variable to HEX

Re: Calculating differences in dates multiple observations

Re: Converting number of days since Jan 1 1960 to YYYYMMDD

SAS Inner Circle Panel

SAS Analytics Explorers