About Vince28_Statcan

Vince28_Statcan · ‎08-07-2013

Best trick is to ask your hierachical data provider to send you directly the xml files and do a custom maping yourself. At least given the fact that most hierachical data is at one point or another stored as xml. It is very easy to self learn the basics of xml mapping

Vince28_Statcan · ‎08-07-2013

I don't think the data set has a record for each level of parent for any given child. You would have to somehow preprocess the data to create a record with a level1 child and his level 3/4/5/6 parents to appropriately use a transpose/by combination. At least so it felt with the data sample

Vince28_Statcan · ‎08-07-2013

Not necessarily the simplest approach but here's how I'd go about it. data want; if _n_=1 then do; declare hash hl2(dataset: "have(where=(level=2) rename=(child=nextchild))"); hl2.defineKey('parent'); hl2.defineData('nextchild'); hl2.defineDone(); declare hash hl3(dataset: "have(where=(level=3) rename=(child=nextchild))"); hl3.defineKey('parent'); hl3.defineData('nextchild'); hl3.defineDone(); declare hash hl4(dataset: "have(where=(level=4) rename=(child=nextchild))"); hl4.defineKey('parent'); hl4.defineData('nextchild'); hl4.defineDone(); declare hash hl5(dataset: "have(where=(level=5) rename=(child=nextchild))"); hl5.defineKey('parent'); hl5.defineData('nextchild'); hl5.defineDone(); declare hash hl6(dataset: "have(where=(level=6) rename=(child=nextchild))"); hl6.defineKey('parent'); hl6.defineData('nextchild'); hl6.defineDone(); end; set have(where=(level=1)); rc=hl2.find(key: child); if rc=0 then do; parent1=nextchild; rc=hl3.find(key: nextchild); if rc=0 then do; parent2=nextchild; rc=hl4.find(key: nextchild); if rc=0 then do; parent3=nextchild; rc=hl5.find(key: nextchild); if rc=0 then do; parent4=nextchild; rc=hl6.find(key: nextchild); if rc=0 then parent5=nextchild; end; end; end; end; drop nextchild parent level rc; run; Based on your data example, I assumed there were 6 levels but since level1 doesn't have a parent, that there should only be parent1-parent5 and not parent-parent6 variables to fill. If there are 7 levels or if level1 can have both a parent and a child it can be worked around. The issue with the above approach is that if any "key" is present multiple times (like if xxx has parents yyy and zzz) then the hash table, defined as is won't support it and will retain only one of the two (I can't remember if it retains first key or replaces). It can be worked around using multidata option but then the looping becomes a tad bit hectic. Point being, conceptually this does a merge whilst renaming variables appropriately of the following subsets of your original data: have(where(level=1)) have(where(level=2)) ... have(where(level=6)) Using this successive approach with the above data subsets and renaming the variable should also allow you to provide the appropriate results without messing with multidata hash tables like below: proc sql; select t1.child, t2.parent1, t3.parent2, t4.parent3, t5.parent4, t6.parent5 from have(where=(level=1) drop=parent) as t1 left join have(where=(level=2) rename=(parent=child child=parent1)) on t1.child=t2.child left join have(where=(level=3) rename=(parent=parent1 child=parent2)) as t3 on t2.parent1=t3.parent1 left join have(where=(level=4) rename=(parent=parent2 child=parent3)) as t4 on t3.parent2=t4.parent2 left join have(where=(level=5) rename=(parent=parent3 child=parent4)) as t5 on t4.parent3=t5.parent3 left join have(where=(level=6) rename=(parent=parent4 child=parent5)) as t6 on t5.parent4=t6.parent4 ; quit; Note this is all untested its more of a logical guideline on how you can tackle the problem. Vince *Edit indeed left join seemed more appropriate for the OPs objectives so I edited the sql example

Vince28_Statcan · ‎08-06-2013

Sigh, sorry, updated my reply again. defineDone is a method of the hash table and SAS requires () syntax at the end h.defineDone(); I always omit it by mistake when I don't have data to test my syntax. Edit: If this can be of any additionnal diagnosis of your original post code. Since you are not keeping a new variable for your MBR_SYS_ID each time you find a major study case, your do pt=_N_+1 ... end; statement loops on all studies content that is sorted below your current mbr_sys_id so if some studies overlap, each major study will also cause other studies' procedures that occured within the appropriate timeframe to be output including even the record for a new study if it has an associated procedure applied that day.

Vince28_Statcan · ‎08-06-2013

my bad, it's < (dataset: and not (datasets: updating original post.

Vince28_Statcan · ‎08-06-2013

I stand corrected. I've edited my above example although I recommend not using it. It is a not so trivial way around "pile" effect of lag function. It works though! Thanks Vince Here's the approach I would probably use similar to others above although saving the conditional if statement in favor of the coalesce function. data want; set have; total=value+coalesce(total,0); retain total; run;

Vince28_Statcan · ‎08-06-2013

There are multiple alternatives. The simplest mathematical approach is to use abs function to sort and just keep top 4. I'd use a proc sql order by to sort so as to avoid having to create a new variable with the absolute values proc sql; create table have as select * from have order by abs(yourcolumnname); quit; data want; set have(obs=4); run;

Vince28_Statcan · ‎08-06-2013

data want; if _N_=1 then do; declare hash h(dataset: "maj_restudy(where=(prov_tin_maj_restudy=1) rename=(fst_srvc_dt=major_study_start_date))"); h.defineKey('mbr_sys_id'); h.defineData(major_study_start_date); h.defineDone(); end; set maj_restudy; rc=h.find(); if rc=0 then do; if (1 LE fst_srvc_dt - major_study_start_date LE 90) and (nat_perf=1 or nat_cath=1 or nat_pet=1 or nat_ct=1) then do if nat_perf=1 then sub_table="MAJPERF"; else if nat_cath=1 then subtable="MAJCATH"; else if nat_pet=1 then subtable="MAJPET"; else if nat_pet=1 then subtable="MAJCT"; output; end; end; drop rc major_study_start_date; run; This is an alternative based on your data example. However, if there are multiple Major Studies in any given MBR_SYS_ID, it won't work as is since the hash table defined this way only supports 1 occurence of the key. You may need to play with formats a little, I assumed your dates were naturally numeric values with a date format and not strings as well as I assumed that your indicator columns were numeric. Basically, it uses the lookup power of hash tables to obtain the start date of the unique major study for each MBR_SYS_ID (if there is one) and puts it in the data vector to allow computation. This would give you multiple rows for a given MBR_SYS_ID if there had been more than one of the 4 columns with 1 for the same MBR_SYS_ID within the following 90 days. Vince P.S. When you say "but it did not work". It is best to either give an example of the bad output or a copy of your error log for us to help. For instance, I can't tell from your post if it was a logic or syntax error and reading through someone elses' thoughts/code looking for both is harder. Especially when I don't know your data structure variable types and formats. A quick glance tells me that if we =. then we=fst_srvc_dt; if wb=' ' then wb=fst_srvc_dt; if we=wb then output; is basically outputing every single row that didn't have a major study in addition to what should've achieved your desired result in the big outer do;end; block

Vince28_Statcan · ‎08-06-2013

Hi Ashok, I'm not going to lie here I'm not quite familiar with proc report. My guess however, based on the example in the second link is that if you were to set one as 50% and the total width as 100% in your proc report statement, that the leftover portion would be automatically adjusted to whatever is leftover - likely "almost 50% but there are fractions of % taken for borders and whatnot" I suggest you try removing the width option from either define statement and if it doesn't fix it, I can try and dig more into this issue. Vince

Vince28_Statcan · ‎08-06-2013

Alternatively, you can take a look at lag<n>(var) functions. It allows you to access the nth previous row value of var effectively allowing you to run rolling totals like last 7 days or last month total etc. but also to achieve your desired result without the need of an additionnal column. Syntax is slightly less natural though since you need to do a case for _N_ LE n e.g. data want; set have; value=value+coalesce(lag1(value),0)+coalesce(lag1(value),0); run; This effectively uses your previously rolling total in variable Value (as replaced at each data step iteration) and adds it to the current row "Value" and then replaces it. Note that the do; end; is not necessary as this is a single line statement but I put it for readability Also note that lag1(value) is equivalent to lag(value) but again, its best for readability. Vincent *edited after Astounding's comment. Doing clever use of lag1 pile effect to achieve the desired result. I strongly recommend against it though honestly as it's easy to get lost in piles when there are easier ways around like creating a new variable and dropping the old one.

Vince28_Statcan · ‎08-06-2013

Base SAS(R) 9.3 Procedures Guide, Second Edition The width= option in define statement does not apply to outputs other than traditional monospace output. See Base SAS(R) 9.3 Procedures Guide, Second Edition For an example of how to use define age / style(column)=[cellwidth=1in]; define sex / style(column)=[width=10%]; styles options to specify width for other ODS destinations than monospace. Hope this helps! Vincent

Vince28_Statcan · ‎08-03-2013

I think $varyingw. is strictly an informat found SAS(R) 9.4 Formats and Informats: Reference but you can lookup for it in the format list it is not there. It does make some sense too. Depending on your ODS destination, you *should* be able to avoid the issue of white space padding by using a trim(measure_name). I am not very familiar with proc report so I may be wrong about line statement supporting trim() like the put statement since there are a few features of put not supported by line Vince

Vince28_Statcan · ‎08-03-2013

proc format; value $CMC 'pcifu182'='1-182 DAY PCI REDO RATE' 'pcifu30'='1-30 DAY PCI REDO RATE' /* more formats to cover all distinct procedure */ ; run; proc transpose data=have out=want; by strata procedure; id location; var value_display; run; ods rtf file='c:\somefile.rtf'; proc print data=want; by strata; format procedure $CMC.; run; ods rtf close; The syntax is only an indication of how to go I can't test this from home at all. If you have an existing dataset with the full procedure name / procedure acronym you could/should use that instead of the proc format just to avoid hard coding the proc format for all your different procedures. If you want to change the name of the variables created by transpose (IP OP OFC) you can use datasets options label=(var1='label1' var2='label2') in either the proc print data=want() options or even prior to this if you ever plan to do other manipulation of the want dataset you can do it at output time on the proc transpose data=have out=want(rename=(IP=INPATIENT OP=OUTPATIENT OFC=OFFICECLINIC)) For your desired output I'd likely use the label statement in the proc print simply because it allows you to use sentences rather than var names respecting SAS varnames. The big chunk really is how to use proc transpose to get a dataset that respects your output objectives. Proc report procedure wouldn't offer you much more than proc print unless you intended to play a lot with fonts/colors/size etc. *edit corrected for format. Moved it as a statement in the procedure instead of a dataset option. Vincent

Vince28_Statcan · ‎08-03-2013

If your reference table doesn't have millions of records, in can all be done in a single sql query as follow proc sql; create table want as select a.city1, a.city2, (case when a.city2 in (select distinct city_a from reftable) then city2 when (a.city2 not in (select distinct city_a from reftable) and a.city1 in (select distinct city_a from reftable)) then city1 else "ROC" end) as city_A from have; quit; Above would be the code with the best readability but if the (select distinct city_a from reftable) returns hundred of thousands of records, then the "in" operator lags significantly behind other processing strategies. Typically, when I want to do some funky merge between a large DS and a smaller one, as you've described above, I tend to use hash tables. Here's an alternative solution, again in a single step with significantly faster processing but using the less natural hash object syntax data want(rename=(city_c=city_a)); if _N_=1 then do; length city_a $40.; /* placeholder length, preferably use the same as the length of your city_a variable in your reference table */ declare hash h(datasets: 'reftable'); h.defineKey('city_a'); h.defineData(); /* only necessary if you ought to retain additional columns from reference table in the merge */ h.defineDone(); /* I have a blank this may or may not require the () for appropriate syntax */ end; set have; if h.find(key: city2)=0 then city_c=city2; else if h.find(key:city1)=0 then city_c=city1; else city_c="ROC"; run; Basically, hash table is a lookup table. RC is the return code of the find method on the hash table. It is 0 if the key was found, some code NE 0 otherwise. Hash table is all in memory which means great gains of efficiency for medium-large datasets. However, it is capped the RAM allocation of your computer to SAS. With a single key variable and no data you should have no problem get a hash table of 1M records though. Vince

Vince28_Statcan · ‎08-02-2013

I think you need to develop a little more on your data and intended results concept. When I look at your data example, it reads as though your datafile1 has 2 records per case, one with old (as matched with datafile2) and one with updated info and our aim is to more or less update datafile2? If that's the case and you have additional information about your files structure, you might be able to use clever lag functions and the like to achieve your desired results. For instance, if datafile1 is automated system generated and always produces exactly 2 records in a row of the same case there can be fairly easy processing done between array _temporary_, hash tables and a custom new build index on datafile1 using like ceil(_N_/2) to create the index. Anyway point being, if your data has a specific structure, there are probably far more reliable ways to process your data than eventually having to rely on string distance metrics to match addresses as the update to your "key" id is making processing very painful. Nonetheless, if this is something that might be repeated over time and cases often change ID, you should request for a 3rd dataset with a date or datetime stamp and 2 more variable - old_id new_id as to document any such change in your primary keys. Vince

Online Status	Offline
Date Last Visited	‎07-02-2019 05:06 PM

Re: How to import this SDMX-ML data from Statistics Canada in SAS?

Re: Using the XML Mapper Utility

Re: Analysis by row

Re: SAS converting character variables to numeric while exporting to C...

Re: SAS converting character variables to numeric while exporting to C...

Re: If then statement to case statement

Re: using %sysfunc(cat() )

Re: proc contents

Re: Sas merge help

Re: Sas merge help

Re: put statement - format used contained in a variable

Re: Comparing one dataset with another without merging (with the help ...

Re: Comparing one dataset with another without merging (with the help ...

Re: Is it possible to run Excel VBA code using SAS

Re: FORMAT function

Re: Attempt to %GLOBAL a name (NAME) which exists in a local environme...

Re: Unable to export data to local folders (PROC EXPORT in SAS EG)

Re: Make first letter capital only

Re: Removing duplicate pairs i.e keeping only unique values that weren...

Re: Macro error

Re: Need help on a hierarchical data

Re: Need help on a hierarchical data

Re: Need help on a hierarchical data

Re: do statement or transpose?

Re: do statement or transpose?

Re: How to create cumulative sums

Re: finding the nearest value to zero

Re: do statement or transpose?

Re: How to get uniform width across tables in a report which are print...

Re: How to create cumulative sums

Re: How to get uniform width across tables in a report which are print...

Re: Proc Report: var in compute/line statement does not print to repor...

Re: proc report to rtf problem

Re: Merge by different fields on a priority

Re: Selecting for One Observation Among Two of the Same