About Solph

Kurt_Bremser · ‎07-27-2023

MD5 will always result in the same hash for the same source. Since it gives a finite number of results (2^128) for any kind of input, it is not reversible. Given some parameters for the source, though (like length 10 and "only digits"), one can easily create a reference table for all the 10^10 possible sources.

Tom · ‎02-16-2023

If you had to modify the INPUT statement to that code with @ pointers and fixed format input instead of normal LIST MODE input then either your actual data lines have TABS or other strange characters instead of spaces (Which is why jumping over them with @ worked) or you have some missing values that are not demarked with a period. Note that if you are using Display Manager to edit and submit code then the tabs in the data lines will automatically be replaced by spaces. But if you are using SAS/Studio to submit the code then this does not happen and you end up with actual TAB characters in the data lines. You can tell SAS/Studio editor not to insert actual TAB characters into the file by changing your preferences. You can still hit the TAB key to indent your code, but it will just insert the proper number of spaces to move to the next tab stop instead of messing up your code file with embedded tabs. Example: This data step will work (note there is no need to add an extra RUN statement after the end of the data step). data have; input ID start :date. end :date.; format start end yymmdd10.; datalines; 1 27JAN2018 06MAY2019 2 . 09MAR2020 3 31OCT2017 03NOV2019 4 15JAN2018 . 5 04JUL2015 12JUL2020 ; But data lines with only spaces will NOT work with list mode input since the INPUT statement will go hunting for a value. data have; input ID start :date. end :date.; format start end yymmdd10.; datalines; 1 27JAN2018 06MAY2019 2 09MAR2020 3 31OCT2017 03NOV2019 4 15JAN2018 5 04JUL2015 12JUL2020 ;

gema · ‎01-09-2023

can you create first dataset if first.id and rest dataset else if not first.id. merge those two data sets by id if diff(any of the dates) is <= 1)

Solph · ‎02-19-2022

Thanks mkeintz for the codes. All of them work beautifully. And Tom's way to get around for 3 datasets is brilliant and very concise. Only wish I could mark both your replies as solutions.

Kurt_Bremser · ‎11-18-2021

No need for a macro at all: proc sql; create table id_year as select have.id, year.year from have left join year on have.startyr le year.year le have.endyr ; create table want as select year, count(id) as count from id_year group by year ; quit;

Solph · ‎11-01-2021

Re using SYMPUTX to create the macro variable, the following code works as well: proc sql; select max(substr(name,2,4)) into: lastyr from toc; quit;

Solph · ‎01-16-2021

Thanks all who replied. I thought I'd have a few words before I close the case. 1. I realized having filling the missing dates between start date and end date is not efficient, because my data is easily 100K cases and each spans over a year up to 10 years, so even if the code works, it would not be efficient. 2. I need to either back fill missing dates and values (for days between start date and first assessment date) and forward fill (for days between the last assessment and end date). So it's a daunting task, and messy at time for data pulls with different criteria. 3. I then recalled the data provider already produced a data with those events data restructured at the event level, that is, they reconstruct the data when there is a a turn of event, with proper start date and end for each event assessment record. It probably took them months to first come up with the code and it's fine tuned each year ever since. For those who are curious how it is done: 1. if there is a gap between start date and 1st assessment date, then they insert a record, with EVENT start date=start date and EVENT end date = 1st assessment date. They then calculate event days between the two dates. 2. if there is a gap between the last assessment record and end date, they insert an EVENT record with EVENT start date = last assessment date and EVENT end date = end date. 3. For everything in between, they'd reconstruct an EVENT record with the 1st assessment date as EVENT start date and the next assessment date as EVENT end date, and so on. 4. There are other turns of events we'd consider (e.g. calendar year end), it's the same logic to break into event records. The source data (as in sample data) has unique assess ID assigned to each record. The reconstructed data keep the assess ID if the reconstructed event is associated with or derived from the assessment record. So one can link the source data to the reconstructed event data by this assess ID and make use of the number event days to multiply whatever in the source data (such as VAR1 in my sample data). The code works if we know what turn of events we want. For other analysis, filling by daily date might be more desired I guess. Thanks again for all who helped to answer and provide your tips. Much appreciated.

Reeza · ‎12-17-2020

1. You cannot use CARDS/DATALINES within macros. Your code needs to generate the appropriate file name, likely requires date in a specific format, YYMMDD. You cannot use FORMAT statements on your macro variables but you can add it as the second parameter within SYSFUNC() or use PUTN Something like the following most likely. %let Filedate=%sysfunc(putn(&date, yymmddn8.)); 2. Try using UPPER or LOWER instead. Otherwise your data is different - possibly invisible white spaces. Running a PROC FREQ on the column will allow you to verify the data. 3. Append is more efficient than a data step for combining data. In Append it copies the data over in multiple lines because there's no expectation of doing any calculations whereas a data step may try to change data somehow so it needs to process it line by line. This entire process is inefficient and likely doesn't need a macro. As Paige indicates you likely want to append your data and filter it dynamically for multiple dates at once and do the count as the second step. I'd create a view that did this personally. Here's some instructions on how to use a %DO loop with dates. https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=mcrolref&docsetTarget=n01vuhy8h909xgn16p0x6rddpoj9.htm&locale=en

ChrisNZ · ‎06-30-2020

Even better if you want the zeros to be populated too: ID01APR = ( ENTRY_DATE <= mdy( 4, 1, FYEAR) <= DISCHARGE_DATE ) ;

Solph · ‎06-24-2020

mklangley, yours is similar to the one I googled and found on the net (see below), but it's good that you created var concatenating fiscal year and month. data want; set have; date = entry_date; do until(date > discharge_date); month = month(date); output; date = intnx("MONTH", date, 1, "BEGINNING"); *Alignment SAME is clock 30 days, BEGINNING counts if present 1st day, END then last date; end; format date Date9.; run; proc print; run;

Solph · ‎12-05-2019

Thanks so so ... much for the code. The code works in a way and not in a way. 1. Column Note is just to manually note about the records, for illustration purposes. So it can't be used to filter. (And FYI, some text entries were wrong, it's now fixed in the code below.) 2. When I removed the condition using note (if note='' then), it's creating a few more rows that shouldn't be there. I'm wondering if there is a way to handle it? Specifically rows _N_ in (1,3, 5, 8, 10, 12, 14). 3. Most of all, ideally I'd like to merge all rows in consecutive order (if meeting the criteria, as you specified), not just the next row, which means, sysID 1,2,3 into one; sysID 6,7 into one; syID 9,10 into one Your original code would merge 1 and 2 into one, and then 2 and 3 into another row. Ideally I'd like to have 3 three in one. (I also modified the code, without space when combine values, so it's "AAA,BBB", not "AAA, BBB", to help parsing (based on Reeza's code to parse, as in https://communities.sas.com/t5/General-SAS-Programming/parsing-a-character-string-into-new-variables/td-p/129189) FYI, I added additional code to parse the text values into separate variables. If you run the whole code, the last proc print; run; is the data I'd like to have. Or see the pictures, first is HAVE, then Want - if merge pair rows, or then Want - if merging all pair rows (True Want). Hope I can get further help. It's greatly appreciated. data have; length sysid id epid 8. hospid tohosp_id $3. indate outdate x1 x2 8. note $30.; input sysid id epid hospid $ toHosp_ID $ INdate OUTdate x1 x2 NOTE $char30.; format note $30.; datalines; 1 1 11 AAA BBB 2008 2011 1 2 conseq next one by date 2 1 12 BBB CCC 2011 2012 4 5 conseq next one by date 3 1 13 CCC EEE 2012 2014 7 8 4 1 14 EEE 999 2016 2019 2 4 5 2 21 AAA CCC 2013 2015 3 5 6 2 22 CCC AAA 2017 2018 1 1 conseq next one by date 7 2 23 AAA CCC 2018 2018 2 2 8 2 24 CCC 999 2019 2019 1 2 9 3 31 305 CCC 2015 2017 5 6 conseq next one by date 10 3 32 CCC EEE 2017 2019 8 9 11 3 33 FFF 999 2019 2019 1 2 ; run; proc print; run; data want; set have; _lagepid=lag(epid); _laghospid=lag(hospid); _lagtohospid=lag(tohosp_id); _lagindate=lag(indate); _lagoutdate=lag(outdate); _lagx1=lag(x1); _lagx2=lag(x2); if hospid=lag(tohosp_id) and indate=lag(outdate) then do; combined_epid=catx(",", _lagepid, epid); combined_hospid=catx(",", _laghospid, hospid); combined_tohospid=catx(",", _lagtohospid, tohosp_id); combined_indate=catx(",", _lagindate, indate); combined_outdate=catx(",", _lagoutdate, outdate); combined_x1=catx(",", _lagx1, x1); combined_x2=catx(",", _lagx2, x2); output; end; else; do; combined_epid=epid; combined_hospid=hospid; combined_tohospid=tohosp_id; combined_indate=indate; combined_outdate=outdate; combined_x1=x1; combined_x2=x2; output; end; drop _: hospid--x2; run; proc print; run; *fix1: combine row that are in consecutive PAIR order; *fix2: combine rows that are in consecutive order; data fix1; set want; drop epID sysID Note; if _N_ in (1,3, 5, 8, 10, 12, 14) then delete; *Removed unwanted row; run; data fix2; set fix1; if _N_ =2 then delete; *it should be merged with _N_1; if combined_epid='11,12' then do; *reset values; combined_epid='11,12,13'; combined_hospid='AAA,BBB,CCC'; combined_tohospid='BBB,CCC,EEE'; combined_indate='2008,2011,2012'; combined_outdate='2011,2012,2014'; combined_x1='1,4,7'; combined_x2='2,5,8'; end; run; proc print data=fix1; run; proc print data=fix2; run; *Parse variables; %let fixdata=fix1; %let fixdata=fix2; data TRUEwant ; set &fixdata; format hosp1-hosp3 $3.; format tohosp1-tohosp3 $3.; array parsed_epID (*) epID1-epID3; array parsed_hosp (*) hosp1-hosp3; array parsed_tohosp (*) tohosp1-tohosp3; array parsed_x1 (*) x1_1-x1_3; array parsed_x2 (*) x2_1-x2_3; i=1; do while(scan(combined_tohospid, i, ",") ne ""); parsed_epID(i) =scan(combined_epid, i, ","); parsed_hosp(i) =scan(combined_hospid, i, ","); parsed_tohosp(i) =scan(combined_tohospid, i, ","); parsed_x1(i) =scan(combined_x1, i, ","); parsed_x2(i) =scan(combined_x2, i, ","); i+1; end; drop i epID sysID Note; run; proc print; run;

Solph · ‎09-25-2019

I thought I tried scan(var, -1,"_") and it didn't work. But apparently I used it wrong. Your code works. Thanks a lot. And thanks all for the help.

Urban_Science · ‎08-21-2019

No problem! I made a couple of updates along the way. My latest update a few minutes ago was to include ID in the keep statement; otherwise, the column wouldn't exist for the join. @ballardw noticed the typo in the let statements, which I missed because I was too focused on what was going on in the macro. Team effort! Glad it works!

ChrisNZ · ‎01-28-2019

@ballardw I'd use call execute as well for his task.

Solph · ‎01-22-2019

Thanks a lot FreelanceReinhard and Reeza. They totally addressed my needs. So the code below worked! Thanks so much. data have; input id 1 fy2004 $ 3-4 fy2005 $ 6-7 fy2006 $ 9-10 fy2007 $ 12-13 _Mostfreq $ 17-18 _Count 20 _Total_count 22; datalines; 1 00 13 13 13 13 3 4 2 14 14 14 14 3 3 3 12 12 12 05 12 3 4 4 01 01 02 01 2 3 5 00 00 12 12 00 2 4 ; data want; set have; length MostFreq $2; array list fy:; array _t[0:99] _temporary_; *Set the max possoble value, in this example 99; call missing(of _t[*]); do i=1 to dim(list); if list[i] ne '' then _t[input(list[i],2.)]+1; end; Count=max(of _t[*]); Total_Count=sum(of _t[*]); *MostFreq=put(whichn(Count, of _t[*]),z2.); MostFreq=put(whichn(Count, of _t[*])-1,z2.); drop i; run;

Online Status	Offline
Date Last Visited	‎07-31-2023 10:23 PM

Re: How to encrypt ID in the dataset[]

How to encrypt ID in the dataset[]

Re: Create a monthly end date between year in a macro

Re: Create a monthly end date between year in a macro

Re: Create a monthly end date between year in a macro

Create a monthly end date between year in a macro

Re: Pick up values from multiple rows of same ID if meeting criteria

Pick up values from multiple rows of same ID if meeting criteria

Pick up values from multiple rows of same ID if meeting criteria

Re: Merge: how to concatenate values of the same variables from both d...

Re: Us %sysfunc or %substr to create fiscal year

Re: Using external file to add variable label

Re: Merge: how to concatenate values of the same variables from both d...

Merge: how to concatenate values of the same variables from both datas...

Re: Do loop to get cases by month

Assign sequential ID by multiple groups

Re: How to encrypt ID in the dataset[]

Re: Create a monthly end date between year in a macro

Re: Pick up values from multiple rows of same ID if meeting criteria

Re: Merge: how to concatenate values of the same variables from both d...

Re: Run macro on a list of values from another dataset

Re: Create a variable with name based on part of variable names

Re: Insert missing date records and have values carried from before or...

Re: %DO Loop to loop through date and other

Re: Do loop and Intck to get cases point in time

Re: Do loop to get cases by month

Re: Retain values from consecutive records

Re: Find text from the right before a modifier

Re: Macro - parameter of parameters?

Re: Using external file to add variable label

Re: Find most frequent value across columns