SAS Support Communities

quickbluefish

Not that it's shorter in this case, but you could also solve this using subqueries in a join like this: proc sql; select a.countrycode, a.indicatorname, a.estyear/100 as estpct1 format=percent8.2, a.estyear3/100 as estpct3 format=percent8.2, estyear1 from (select * from sq.globalfindex where indicatorname="Borrowed for health or medical purposes (% age 15+)") A inner join (select countrycode from sq.globalmetadata where upcase(region)=upcase("Europe & Central Asia") and upcase(incomegroup)=upcase("High income")) B on a.countrycode=b.countrycode order by a.estyear1 desc; quit;

quickbluefish

Here is some fake data to more or less match what you have in your screenshots - run this and use PROC PRINT to look at these if you like: data seqs; length seq $20 sequences $6 s1-s8 3; array s {*} s1-s8; do i=1 to dim(s); s[i]=(rand('uniform')<0.3); end; drop i; call symputx("nseqs",_N_-1); infile cards dsd truncover firstobs=1 dlm='|'; input seq sequences; cards; 1,2,3,4,5,6,7,8|seq1 1,2,3,4|seq2 1,3,6,8|seq3 3,4,7|seq4 ; run; data values; length score1-score8 3; array s {*} score1-score8; do r=1 to 20; do i=1 to dim(s); s[i]=(rand('uniform')<0.6); end; output; end; drop r i; run; I can't say I really understand your request - especially whether you're trying to sum things vertically or horizontally. My assumption based on what you said is that you want to sum horizontally... : data want; set seqs (in=A) values ; array T {&nseqs} $20 _temporary_; array sc {*} score1-score8; array sq {*} seq1-seq4; if A then T[_N_]=seq; else do; call missing(of sq[*]); do i=1 to dim(sq); do s=1 to countW(T[i],','); sq[i]+sc[scan(T[i],s,',')*1]; end; end; output; end; keep score1-score8 seq1-seq4; run; proc print data=want; run;

quickbluefish

The first part is just a slight re-work of your input dataset - it was producing all sorts of errors trying to read as it was: data WORK.SUBTYPE_SAMPLE; infile cards dsd truncover firstobs=1 dlm=','; length ID $12 type $12 reference_date service_start service_end 4; informat reference_date service_start service_end date9.; format Reference_date DATE9. service_start DATE9. service_end DATE9.; input ID Type Reference_date service_start service_end; cards; 1,A,04JAN2016,10JAN2016,21JAN2016 1,B,04JAN2016,09JUL2018,09NOV2019 1,Unspecified,04JAN2016,06JAN2016,10FEB2016 2,B,08JUN2019,08DEC2019,19DEC2019 2,Unspecified,08JUN2019,22OCT2019,09AUG2019 3,Unspecified,02FEB2017,02APR2017,15APR2017 4,A,01JAN2020,03MAR2020,24MAR2020 4,A,01JAN2020,05MAY2018,10MAY2018 4,Unnspecified,01JAN2020,02JAN2020,03JAN2020 5,A,09SEP2016,11NOV2016,15NOV2016 5,B,09SEP2016,09SEP2016,10NOV2016 6,A,03MAR2016,30AUG2016,02NOV2016 6,A,03MAR2016,14OCT2016,19OCT2016 6,A,03MAR2016,26MAR2016,19DEC2016 6,Unspecified,03MAR2016,20OCT2016,21OCT2016 6,Unspecified,03MAR2016,12DEC2016,28DEC2016 6,B,03MAR2016,28JUN2016,15AUG2016 7,B,10OCT2022,11OCT2022,14NOV2022 8,Unspecified,01JAN2019,05MAY2019,06MAY2019 8,Unspecified,01JAN2019,07MAY2019,08MAY2019 ; run; proc sort data=subtype_sample; by id; run; data want; set subtype_sample; by ID; length true_type $12 closest 4 anyAB 3; retain true_type closest anyAB; if first.ID then do; closest=10000; true_type=''; anyAB=0; end; dist=min( abs(service_start-reference_date), abs(service_end-reference_end) ); if type in ('A', 'B') then do; anyAB=1; if dist<closest then do; true_type=type; closest=dist; end; end; else if anyAB=0 then true_type='Unspecified'; if last.ID then output; keep ID true_type closest; run; proc print data=want; run;

quickbluefish

Here's another option if you want to save them into separate macro variables: proc contents noprint data=sashelp.cars out=namy (keep=name label); run; data _null_; set namy end=last; nm=quote(strip(name)); lbl=quote("put text here"); c=','; if last then c=''; msg=compbl('{ name= ' || nm ||' label=' || lbl || '}' || c); put msg; call symputx(compress("msg" || _N_), msg); run; * for example, the 3rd name/label combination ; %put MSG3: &msg3;

quickbluefish

Agree with others here - I would just add that with COUNTW and SCAN (and similar), it's a good idea to specify your delimiter as the optional last argument - otherwise, SAS will try to guess. And further, if the delimiter is a space (as it is here), never a bad idea to clean up potential multiple whitespace characters with %CMPRES: %let YRLIST=2012 2014 2016; %let YRLIST=%CMPRES(&YRLIST); %let nYRS=%sysfunc(countW(&YRLIST,' ')); .... %let YR=%SCAN(&YRLIST, &i, ' '); * you can also use %STR( ) instead of ' ' above ;

quickbluefish

It's a little hard to follow what you're doing - for one thing, you can comment out multiple lines like this to make things easier to read: /* Here is a comment that is 3 lines long. */ Merging / joining on strings (especially when they're just names that are entered freehand) is not ideal as you've discovered, but sometimes that's all you have. You could try using some sort of "fuzzy" merge like this: https://blogs.sas.com/content/sgf/2021/09/21/fuzzy-matching/ As for the join, I think you will have an easier time assessing with a single LEFT join and then doing some counts on the resulting data: PROC SQL; create table want as select b.permno, a.bobnamesNew, b.company_name_header, a.bobnamesOriginal from work.Temp A LEFT JOIN (select distinct permno, company_name_header from names.CRSPnames) B on a.bobnamesNew=b.company_name_header; title "# distinct bobNames without a match in CRSP"; select count(distinct a.bobnamesNew) from WANT where missing(permno); title; QUIT; Again, you're going to either have to try a fuzzy merge or, more likely, actually manually correct the bobnames. If there are just things like differences in case, whitespace, special characters, etc., then you probably could make this more automated, but you'd have to provide some examples here in order for people to help.

quickbluefish

This is a pretty tricky problem. It might help if we understood what the purpose of the NEW_FLAG variable is?

quickbluefish

This sounds like homework? Not sure this is really a SAS question so much as a basic stats question, but standard deviation is the square root of the variance. And standard error (SE) is the standard deviation divided by the square root of the sample size. Your 95% prediction/confidence interval (CI), in this case, is just going to be evenly distributed on either side of your mean, with the lower limit being: mean - 1.96 * SE ...and the upper limit being: mean + 1.96 * SE If your current variance is, say, 16, and your population size is 18, then your SE would be 4/sqrt(18). So.... if you increased your sample size by 10, then what is your SE? And how does that affect the upper and lower bounds of the CI? I believe population standard deviation has a very slightly different formula than regular SD. ChatGPT is your friend. Better yet, find a tiny fake dataset and calculate variance, SD, SE and CI by hand - really.

quickbluefish

Great. The only reason it's not resolving the macro name is because you have the whole string (starting with dir) inside single quotes. SAS will only resolve macro variables in double quotes (or no quotes at all). You'd have to do something a little trickier to get that particular thing to work with a macro variable, but probably just easier to hardcode your user name unless this really needs to be dynamic. It might work just like this (assuming no spaces in your file path): filename have pipe "dir /b /s C/&user/myfolder\*.png";

quickbluefish

Did you try changing *.txt to *.png?

quickbluefish

How about this - the first data step is just generating some fake data. This allows gaps in months. I am not sure what your ID variable is supposed to be, though: data have; date='01Jan2023'd; format date date9.; do i=1 to 50; date+rand('integer',5,45); co2=rand('integer',1,20); output; end; drop i; run; proc sort data=have; by date; run; proc sql noprint; select min(intnx('month',date,0)), max(intnx('month',date,0)) into :firstmonth trimmed, :lastmonth trimmed from have; quit; data _null_; call symputx('nmonths',intck('month',&firstmonth,&lastmonth)+1); run; %put NMONTHS: &nmonths; data want; set have end=last; array T {-1:&nmonths} _temporary_; T[intck('month',&firstmonth,date)+1]+co2; if last then do; do i=1 to &nmonths; yrmonth=put(intnx('month',&firstmonth,i-1),yymmn6.); month_m0=T[i]; month_m1=T[i-1]; month_m2=T[i-2]; output; end; end; keep yrmonth month_:; run; proc print data=want; run;

quickbluefish

By 'predictive', I just means it's positively associated with hypertension, not that rurality causes hypertension.

quickbluefish

I think your life will be easier if you use the DESCENDING option in the proc logistic statement -- that will make 0 the referent ("no") category for all binary variables. Right now, you've currently got 0 as the referent group for your dependent (left hand side) variable (because of the event= syntax) and *1* as the referent group for the rural variable. So interpretation is pretty non-intuitive at the moment. Instead, do this: proc logistic data=work.research_lr DESCENDING; model hot_spot = rural; run; Doing the above should result in an odds ratio that's the reciprocal of what you currently have -- 1/0.627 = 1.595 An OR of 1.595 (assuming a confidence interval that does not include 1) would mean that rurality is predictive of hot spot hypertension. More specifically, the interpretation is that the odds of hypertension are 1.6X higher for rural people than for non-rural people.

quickbluefish

Somehow I've never seen the ASPECT option in SGPLOT / SGPANEL - that is really good to know.

quickbluefish

Agree this is odd - especially the ones that show up on the bottom when the lines are close together. At least for the ones that place the label on top, I think the algorithm is just trying to put it in the least ambiguous position possible, even if it doesn't look great. It's more obvious when you have a lot of lines, some of which (like in your plot) only extend part way down the x-axis: You could play around with something like this instead -- labeling with the TEXT statement instead of the CURVELABEL options: data test; do grp=1 to 10; ymean=rand('erlang',2)*5; nwks=25; if ranuni(0)<0.4 then nwks=rand('integer',5,25); x=.; y=.; do wk=1 to nwks; yval=ymean+rand('normal')*10; if wk=nwks then do; put 'hello?'; x=wk; y=yval; end; output; end; end; run; proc sgplot data=test noautolegend; series x=wk y=yval / group=grp; text x=x y=y text=grp / group=grp textattrs=(size=12pt) position=right backfill backlight; scatter x=x y=y / group=grp markerattrs=(size=12pt); run; Looks a little silly as-is, but if you fiddle with the options, I would bet you can get something good.