About MikeZdeb

MikeZdeb · ‎04-27-2012

hi ... this makes some fake data with variables SCHOOL, STUDENT, YEAR, and GRADE then uses SQL to find students within schools with A in all three years (RANTBL ... 30% chance that student gets an A in any given year) ... data have; do year = 2008 to 2010; do school = 1 to 5; do student = 1 to 100; grade = rantbl(999,0.3,0.5); output; end; end; end; run; proc sql; create table gradea as select school, student from have (where=(grade eq 1)) group by school, student having count(*) eq 3 order school, student; quit; school=1 student 32 33 35 87 92 school=2 student 52 school=3 student 16 62 school=4 student 4 19 58 75

MikeZdeb · ‎04-26-2012

hi ... I used your CSV file and hopefully I understand the counts that you want ... * make a data set; proc import datafile='z:\sampledata.csv' out=xxx; delimiter=';'; run; * get rid of all those formats/informats added by IMPORT; proc datasets lib=work nolist; modify xxx; format _all_; informat _all_; quit; * find START in January (change2-change4) ... if START in January, find STOP; data jan; set xxx; array change(2:16) change2-change16; * did they start in January; start = whichc('OPENED to TRIAL_4W',change2,change3,change4) + 1; if start gt 1; * when did they stop; stop = whichc('CLOSED from TRIAL_4W',of change(*)) + 1; run; output attached ... within each January week (START 2, 3, 4) ... if STOP = 1, no CLOSED was found in the CHANGE variables

MikeZdeb · ‎04-26-2012

hi .. I had the same reaction and wondered how there could be an SGF 2012 paper on inexact matching that did not even reference the functions you mentioned (so someone like MSPak could read a new paper and not even be made aware of the functions)

MikeZdeb · ‎04-26-2012

hi ... just tried it in V9.2 ... two pages versus one in V9.3 you are correct !!!

MikeZdeb · ‎04-26-2012

hi ... before you start any of the above, since step #1 relies on matching by literals, have you looked at the names in both files and determined if there are thigs you should do before you even start ... for example ... #1 in holding_company, I see ... O.S.K. HOLDINGS BERHAD OSK HOLDINGS BERHAD are they the same company and should you get rid of those periods #2 there's a mix of lower and upper case letters ... should you convert to all uppercase #3 most (90+%) of all the name variables you cite have "BHD" or "BERHAD" as part of the name ... if you are going to look for similarity in names you don't want the fact that the "BHD' or "BERHAD" part of the match contributing anything to a score given to a name comparison #4 sometimes a location is in parentheses (MALAYSIA) and sometimes it's not MALAYSIA just using PROC FREQ on the various name variables would give you some idea as to how to fix up the names before you even try to match names for example, clean up the names and make some new variables to hold those names ... data new_maluw; set z.maluw; * add a record number for later use; mnrec+1; * convert to uppercase, only keep numbers/letters/spaces, convert multiple spaces to one space; nm = compbl(compress(upcase(name),,'kdas')); * get rid of BHD and BERHAD; nm = tranwrd(nm,' BHD',''); nm = tranwrd(nm,' BERHAD',''); run; data new_uw_match; set z.uw_match; unrec+1; nmh = compbl(compress(upcase(holding_company),,'kdas')); nmh = tranwrd(nmh,' BHD',''); nmh = tranwrd(nmh,' BERHAD',''); nmu = compbl(compress(upcase(underwriters_names),,'kdas')); nmu = tranwrd(nmu,' BHD',''); nmu = tranwrd(nmu,' BERHAD',''); run; then run PROC FREQ again on the new variables (nm, nmh, and nmu) and see if there are any other things you should do before you start to match the nm in one file to nmh and nmu in another once you have done the above, here's a suggestion for a start ... haven't used COMPGED much (maybe other folk know about a "good score" level) I usually do this stuf in stages, evaluating the success of each step (e.g. the name match) before I move onto the next ... * use SQL to match the files by a comparison of names, use the COMPGED function to compare names; * you don't have to use all the data since you have pointers (mnrec and unrec); * nm_nmh and nm_nmu are matching scores; proc sql; create table both as select mnrec, unrec, compged(nm, nmh) as nm_nmh, compged(nm, nmu) as nm_nmu from new_maluw, new_uw_match having nm_nmh lt 50 or nm_nmu lt 50; quit; * reconstruct the data using the pointers; * maybe you only add the dates and other vars you need for more work at this point; data both; set both; p1=mnrec; p2=unrec; set new_maluw (keep=nm closdate) point=p1; set new_uw_match (keep=nmh nmu ipo_date) point=p2; run; etc ...

MikeZdeb · ‎04-26-2012

hi ... following produced 1 page (but so did startpage=no ... using V9.3 in windows XP) ... ods results off; ods listing close; ods pdf file='z:\test.pdf' text = "This is a plot of weight and height using SASHELP.CLASS." startpage=never notoc ; goptions reset=all ftext='calibri' htext=2 gunit=pct; * move plot away from text; title ls=2; * move plot away from bottom edge; footnote1 ls=2; symbol1 f='wingdings' v='6c'x c=blue; proc gplot data=sashelp.class; plot weight * height / noframe; run; quit; ods pdf close; ods results; ods listing;

MikeZdeb · ‎04-25-2012

hi ... glad this is winding down !!! I know what the problem is that you had with my last posting ... all those spaces in your folder name ... filename csvfiles pipe "dir /b C:\Data Extract - WSDS\DSWin Pull\DSPull - Ask\*.csv"; kill the PIPE method for getting a list of files in a FOLDER, the LOG ... BUT different use of quotes ... filename csvfiles pipe 'dir /b "C:\Data Extract - WSDS\DSWin Pull\DSPull - Ask\*.csv" '; should work ... also, the next filename statement you have is ... * location of csv files; filename csv 'C:\Ask\'; and it should be ... * location of csv files; filename csv "C:\Data Extract - WSDS\DSWin Pull\DSPull - Ask\" ; the SAS code will not work unless the folder matches in the two filename statements (the location of your CSV files) so, since I'm curious as to if it solve the problem, it'd be nice (if you have a chance) if you gave it a try it would also be of interest to know how the performance (elapsed and CPU times) compare to Patrick's solution (which you say is working just fine) glad that something worked for you !!! ps I'm always amazed when I see folder names that are full of spaces since it's difficult to distinguish one space from two ... reminds me of my students who store files in places like "C:\Documents and Settings\Joe Student\My Documents\EPI 514\data" rather than "c:\epi514\data"

MikeZdeb · ‎04-25-2012

hi ... yes you can get the same statistics with MEANS as you do with UNIVARIATE however, if you use UNIVARIATE and ODS OUTPUT to get the statistics into a SAS data set rather than the PROC option (OUTPUT OUT= ...) in either MEANS or UNIVARIATE, the data set orientation is different, for example ... ods listing close; ods output moments=ustat1 (keep=varname sex label1 nvalue1 where=(label1 in : ('N' 'Mean' 'Std'))); proc univariate data=sashelp.class; var height weight; class sex; run; ods output close; ods listing; gives you ... Var Obs Name Sex Label1 nValue1 1 Height F N 9.000000 2 Height F Mean 60.588889 3 Height F Std Deviation 5.018328 4 Height M N 10.000000 5 Height M Mean 63.910000 6 Height M Std Deviation 4.937937 7 Weight F N 9.000000 8 Weight F Mean 90.111111 9 Weight F Std Deviation 19.383914 10 Weight M N 10.000000 11 Weight M Mean 108.950000 12 Weight M Std Deviation 22.727186 while the follwoing ... proc univariate data=sashelp.class noprint; var height weight; class sex; output out=ustat2 n=n_ht n_wt mean=mean_ht mean_wt std=std_ht std_wt; run; gives you ... Obs Sex n_ht n_wt mean_ht mean_wt std_ht std_wt 1 F 9 9 60.5889 90.111 5.01833 19.3839 2 M 10 10 63.9100 108.950 4.93794 22.7272 (similar to what you would get with PROC MEANS output) the output you want depends on what comes next, what will you do with the data set (to get the output you wanted, I found it easier to use the ODS style output) one distinct advantage of the ODS output is that you do not have to provide all those variable names when using more tha one varaible in the VAR statement

MikeZdeb · ‎04-24-2012

hi ... OK, one last bit of help not much is needed to use the code I posted repetitively and read all your CSV files if you put them all in one folder, you can use ... * create a list of csv files in folder z:\ and a count of such files as macro variables; filename csvfiles pipe "dir /b z:\*.csv"; data _null_; length fnames $10000; infile csvfiles end=done; do j=1 by 1 until (done); input; fnames = catx('*',fnames,_infile_); end; call symputx('nfiles',j); call symputx('files',fnames); run; then you can use a the macro variables within a macro to read a CSV file, create a data set , append that data set to the final data set you want, repeat the process for each CSV file found in the specified folder ... the macro (in the attached SAS file) starts / ends with ... %macro readcsv; proc datasets lib=work nolist; delete allmydata; quit; %do i = 1 %to &nfiles; %let fname=%scan(&files,&i,*); <more> proc append base=allmydata data=x; run; %end; proc datasets lib=work nolist; delete x; quit; %mend; give it a try with a few CSV files (all in the same folder) ps I changed the 2nd data step a bit, changing the LENGTHS of the variables CODE and CURRENCY from 200 and 8 to 15 and 4 by using a LENGTH statement and modifying one ARRAY statement ... data x; length code $15; infile csv(&fname) dsd firstobs=4 lrecl=20000 pad; input date : mmddyy. (&vars) (: x.); array _(*) _: ; array curr(0:&n) $4 _temporary_ (&curr); <more>

MikeZdeb · ‎04-24-2012

hi ... no, just from ... http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002571877.htm and from working with DDE and SAS with Excel and Word in the past in searching around today, I see another good paper ... "Importing Data from Microsoft Word into SAS" http://www.pharmasug.org/download/papers/CC18.pdf with a mention of ... data _null_; file word; put '[FileOpen .Name = "' "C:\PharmaSUG2009\Example.doc" '"]'; put "[EditSelectAll]"; put "[EditCopy]"; put '[FileClose]'; run; but no discussion of the clipboard access method (rather, it's use SAS to cut from Word, paste into Excel ... that's also an interesting application, just not what's needed here) ps I haven't seen Art's paper, but if I had gotten the idea from Art, I would have said ... hey, I got this idea this neat paper by Art Tabachneck and others ... "Copy and Paste Almost Anything" ... http://support.sas.com/resources/papers/proceedings12/238-2012.pdf

MikeZdeb · ‎04-24-2012

hi ... I don't think that AUTONAME has been enabled in UNIVARIATE as it has in SUMMARY and MEANS so, you have to spell out all the new variable names as in Ksharp's 2nd example ... proc univariate data=new; var phase: ; class treat visit; output out=ustat2 n=n1 n2 n3 mean=mean1 mean2 mean3 std=std1 std2 std3; run; or as posted earlier ... proc univariate data=new noprint; var phase: ; class treat visit; output out=ustat2 n=phase1_n phase2_n phase3_n mean=phase1_mean phase2_mean phase3_mean std=phase1_std phase2_std phase3_std run;

MikeZdeb · ‎04-23-2012

hi ... there are different "rules" for the output you get using ODS and the output you get using PROC options from on-line doc for UNIVARIATE ... "You must provide a VAR statement when you use an OUTPUT statement. To store the same statistic for several analysis variables in the OUT= data set, you specify a list of names in the OUTPUT statement. PROC UNIVARIATE makes a one-to-one correspondence between the order of the analysis variables in the VAR statement and the list of names that follow a statistic keyword. " so when using just UNIVARIATE statements to get output, change this ... proc univariate data=new; var phase: ; class treat visit; output out=ustat2 n=n mean=mean std=std; run; to this ... proc univariate data=new noprint; var phase: ; class treat visit; output out=ustat2 n=phase1_n phase2_n phase3_n mean=phase1_mean phase2_mean phase3_mean std=phase1_std phase2_std phase3_std run; the above behavior is similar to the difference between using this ... proc freq data=sashelp.class; tables _all_ / missing noprint out=tables; run; and this ... ods listing close; ods output onewayfreqs=tables; proc freq data=sashelp.class; tables _all_ / missing; run; ods output close; ods listing; the FREQ proc option only puts a table for the last variable in data set SAHELP.CLASS into data set TABLES while ODS OUTPUT has tables for all variables

MikeZdeb · ‎04-23-2012

hi ... OK, providing real data helped ... this worked with that data (after some changes, e.g. providing LRECL) ... data _null_; infile 'z:\ws1.csv' dsd firstobs=2 _infile_=x lrecl=20000 pad; input; x = compress(translate(x,'','"')); x = tranwrd(x, 'Code' , ''); x = tranwrd(x, ',' , '_' ); x = tranwrd(x, '(PA)' ,'' ); call symputx('vars',x); input; call symputx('n',countw(compress(x),',')-1); file 'z:\curr.txt' lrecl=10000; put x; call symputx('curr',compress(x)); stop; run; * translate the #N/A to missing with an informat; proc format; invalue x '#n/a' , 'N/A' = .; run; * use the macro variables and the informat; data x; infile 'z:\ws1.csv' dsd firstobs=4 lrecl=20000 pad; input date : mmddyy. (&vars) (: x.); array _(*) _: ; array curr(0:&n) $ _temporary_ (&curr); do j=1 to dim(_); currency = curr(j); amount = _(j); code = compress(vname(_(j)),,'kad'); output; end; format date mmddyy10.; keep date code currency amount; run; * check the data ... random sample; proc print data=x; where ranuni(999) le .0001; run; * first 10, last 10; data _null_; do j=1 to 10, lastrec-9 to lastrec; set x nobs=lastrec point=j; put j 7. +2 date mmddyy10. +2 currency $2. +2 amount 8.2 +2 code; end; stop; run; the LOG (using the data you provided) ... 574 * use the macro variables and the informat; 575 data x; 576 infile 'z:\ws1.csv' dsd firstobs=4 lrecl=20000 pad; 577 input date : mmddyy. (&vars) (: x.); 578 array _(*) _: ; 579 array curr(0:&n) $ _temporary_ (&curr); 580 do j=1 to dim(_); 581 currency = curr(j); 582 amount = _(j); 583 code = compress(vname(_(j)),,'kad'); 584 output; 585 end; 586 format date mmddyy10.; 587 keep date code currency amount; 588 run; NOTE: The infile 'z:\ws1.csv' is: Filename=z:\ws1.csv, RECFM=V,LRECL=20000,File Size (bytes)=20022456, Last Modified=23Apr2012:19:16:00, Create Time=23Apr2012:21:34:53 NOTE: 5740 records were read from the infile 'z:\ws1.csv'. The minimum record length was 3295. The maximum record length was 3872. NOTE: The data set WORK.X has 3771180 observations and 4 variables. NOTE: DATA statement used (Total process time): real time 26.04 seconds cpu time 16.45 seconds first 10 and last 10 (I checked your CSV file and those are the last 10 values ... HEY !!! looks like it actually worked) ... 1 01/01/1990 U$ . 130042 2 01/01/1990 U$ . 130057 3 01/01/1990 U$ . 130062 4 01/01/1990 U$ . 130079 5 01/01/1990 U$ . 130086 6 01/01/1990 U$ . 130088 7 01/01/1990 MP . 130092 8 01/01/1990 U$ . 130104 9 01/01/1990 U$ . 130113 10 01/01/1990 MP . 130115 3771171 12/30/2011 U$ 34.97 134057 3771172 12/30/2011 U$ . 134058 3771173 12/30/2011 C$ 0.04 134069 3771174 12/30/2011 K$ 0.77 13406D 3771175 12/30/2011 U$ 13.87 134072 3771176 12/30/2011 U$ . 134076 3771177 12/30/2011 RL 10.78 13407C 3771178 12/30/2011 U$ . 134083 3771179 12/30/2011 U$ 25.33 134093 3771180 12/30/2011 CH 7.17 13409Q

MikeZdeb · ‎04-23-2012

hi ... here's a "no math" idea ... data _null_; do x = 123456789, -123456789, 0.003, -0.003, 0; m = input(scan(put(x,e15.),1,'E'),15.); e = input(scan(put(x,e15.),2,'E'),15.); put x= m= e=; end; run;

MikeZdeb · ‎04-23-2012

hi ... i changed the CSV to look as you specified with quotes around all entries in the first three lines I modified the first data step, the one that creates the macro variables used in the second data step (actually made it easier to create macro variable &CURR) give it a try (use with PROC FORMAT and second data step from previous posting) with the new CSV file and with your other data (cut/paste from here since the single quotes with no space bewteen them might look like double quotes if you try to retype this) data _null_; infile 'z:\data.csv' dsd firstobs=2 _infile_=x; input; x = compress(translate(x,'','"')); x = tranwrd(x, 'Code' , ''); x = tranwrd(x, ',' , '_' ); x = tranwrd(x, '(PA)' ,'' ); call symputx('vars',x); input; call symputx('n',countw(x)-1); call symputx('curr',x); stop; run;

Online Status	Offline
Date Last Visited	‎06-28-2018 10:46 AM

Re: Create variables as per separators in string

Re: How to create sub-totals ?

Re: Running a function for each unique value

Re: Aggregating dataset by date, including missing dates

Re: Aggregating dataset by date, including missing dates

Re: Aggregating dataset by date, including missing dates

Re: Aggregating dataset by date, including missing dates

Re: How to write generic libname to multiple databases?

Re: delete/retein observations by frequency (n) of variables

Re: delete/retein observations by frequency (n) of variables

Re: is there simpler way to convert char flag to numeric number

Re: INPUT function requires a character argument

Re: Renaming all variables in a table

Re: Report on missing values of each coloumn

Re: Remove the last character of my string

Re: Proc GMap: Caption and custom legend

Re: Saving PROC GPLOT graph to folder

Re: re: Catx Function Error

How to calculate geodesic distance in SAS

Re: proc freq and where statements

Re: Datastepping - beyond my skills..

Re: Combine Datasets using Inexact Character Variables in SAS

Re: ods pdf startpage=no; and graph

Re: Combine Datasets using Inexact Character Variables in SAS

Re: ods pdf startpage=no; and graph

Re: Stack database columns

Re: proc transpose 4 variables

Re: Stack database columns

Re: How Can i retrive data from RTF to sas Dataset?

Re: proc transpose 4 variables

Re: proc transpose 4 variables

Re: Stack database columns

Re: Scientific notation

Re: Stack database columns