About MART1

MART1 · ‎10-08-2021

Hello I'm trying to create a SELECT statement for different tables having different variables - the aim is to check if each table contains duplicates, but I'm struggling to get the right code (this community kindly helped me to find a solution when I was looking for duplicates considering all columns, but now I only need to select some columns ( i.e. keys)). As I cannot use GROUP BY *, I need to list all the variables in the GROUP BY statement, but I can't find the way to do so. Below is my attempt - which of course does not work, as the SELECT &Keys and GROUP BY &KEYS does not list all the variables. /* Create the tables containing the keys for each report */ data METADATA; input report $ Keys $; datalines; CARS MAKE CARS MODEL BASEBALL NAME BASEBALL TEAM GAS FUEL CLASS NAME CLASS SEX ; run; /*create the summary (final) table*/ PROC SQL; CREATE TABLE WORK.DUPL ( Report char(7), Tot_Dups Numeric(5) ); QUIT; /*macro to loop through all tables*/ %macro makereport(Report=, Keys=); PROC SQL; CREATE TABLE POPULATION_AUX AS SELECT "&Report" AS REPORT , COUNT(A.CT) AS Tot_Dups FROM ( SELECT &Keys, COUNT(*) AS CT FROM SASHELP.&Report GROUP BY &KEYS HAVING CT>1 )A GROUP BY A.Report QUIT; PROC APPEND BASE=WORK.POPULATION DATA=WORK.POPULATION_AUX FORCE; RUN; %mend makereport; /*create the macros*/ data; set METADATA; *builds string to execute macro; str = catt('%makereport(REPORT=',Report,',Keys=', Keys, ');'); *execute macro; call execute(str); run; Thanks in advance

MART1 · ‎09-17-2021

That's great thanks so much @Reeza Very interesting paper - just one question (not strictly related to the topic I appreciate) ; the _ALL_ cannot be used in PROC SQL, for example PROC SQL; CREATE TABLE DUPS AS SELECT COUNT(DISTINCT _ALL_) FROM TEST; QUIT; Do you know if there's an _ALL_ equivalent for SQL? (* can be used, but not with DISTINCT, or in a GROUP BY) Thanks

MART1 · ‎09-17-2021

Hello I have a number of datasets with different variables; I need to return the duplicate records for each dataset (if they exist). I'd like to do this by not having to specify all the variables (so I can wrap it in a macro which will go through all the datasets). My example below: in the dataset TEST, Nick's row is entered twice. I can use PROC SORT nouniquekeys to identify that, but in the BY I need to specify all the variables ( Name Age Car). I cannot find how to use, say, *, without having to list all the variables? data TEST; input Name $ Age Car $; datalines; Mike 40 Volvo Nick 35 Nissan Susan 51 BMW Bill 60 Volvo Tom 35 Ford Nick 35 Nissan Nadia 49 Nissan ; run; proc sort data=WORK.TEST nouniquekeys out=duplicates; by Name Age Car; run; many thanks

MART1 · ‎03-03-2021

Thanks @bballard I did start with Proc Means - the only issue is I need more starts than the ones showed in the dummy example (percentage, count distinct..) so I figured out Proc SQL would give me more options thanks

MART1 · ‎03-03-2021

thanks @yabwon - like Reeza' it's what I needed. thanks for your help

MART1 · ‎03-03-2021

@Reeza exactly what I needed works a treat thank you very much! ps: I did start using Proc Report, but I need to add more stats that the dummy example I showed (percentage, count distinct ect) so figured it out this was a better solution thank you very much, and for your tutorial in Github

MART1 · ‎03-02-2021

thanks @yabwon unfortunately the organisation I work for does not allow to install new packages. Do you know if the same loop can be done using "standard" loops in SAS? I can't get the right syntax / loops to make it work. thanks

MART1 · ‎03-02-2021

Thanks @mklangley I did try PROC TRANSPOSE, however my table has about 150 variables and a few million observations, so I always get a "Not enough storage is available" error message. thanks

MART1 · ‎03-02-2021

Hello I’m using the %mend macro function to repeat the same PROC SQL for several variables in a dataset. This works well but I’d like to be able to go through the variables automatically, rather than having to list them all – in the below dummy example I only have 3 variables (X, Y, Z), but in the real dataset I have many more, and some can be added or removed. Looking online I seem to get close to what I need, but can’t find the right code. Many thanks /*create a dummy dataset */ PROC SQL; CREATE TABLE WORK.SOURCE AS SELECT ID, X, Y, Z FROM SASHELP.BURROWS WHERE ID BETWEEN 100 AND 150; QUIT; /*create an empty table that will be populated with the results from the macro */ PROC SQL; CREATE TABLE WORK.OUTPUT ( Variable Char (1), Count Numeric (12), Sum Numeric (12) ); QUIT; %macro makereport(Var=); PROC SQL; CREATE TABLE TEMP AS SELECT "&Var" AS VARIABLE, COUNT(&Var) as COUNT, SUM(&Var) AS SUM FROM WORK.SOURCE; QUIT; /*append the result of each pass to the new table */ PROC APPEND BASE=WORK.OUTPUT DATA=WORK.TEMP FORCE; RUN; %mend makereport; /*here I must list all the variables, but I like to avoid this*/ %makereport(Var=X) %makereport(Var=Y) %makereport(Var=Z)

MART1 · ‎01-12-2021

Hi @SASKiwi I'd like to have the ID in (which is the primary key in my datasets), to see which ones have different variable's values between datasets. Without ID, it shows the Obs number (example below), but the users of the reports would rather see the IDs. thanks

MART1 · ‎01-08-2021

Hi @data_null__ I can't find the way to select the outputs from PROC COMPARE to add to PROC PRINT (never used PROC PRINT before unfortunately). Below is an example with dummy data: data WORK.TABLE_A; input DAY VAL1 VAL2; datalines; 20190106 2 452 20190107 5 658 20190108 2 743 20190109 8 44 20190202 9 698 20190202 9 698 ; run; data WORK.TABLE_B; input DAY VAL1 VAL2; datalines; 20190106 2 58 20190107 5 658 20190108 2 12 20190109 8 44 20190202 9 698 20190202 9 698 ; run; title "Test"; PROC COMPARE BASE = TABLE_A COMPARE= TABLE_B briefsummary ; ID DAY; RUN; (so using PROC PRINT, I'd like to display the 2nd and 3rd boxes, not the one crossed out) could you please give me an example of how to do this? many thanks

MART1 · ‎01-08-2021

Hi @data_null__ Thanks for your suggestion, Actually I would need a few more info than the dups (I know, that's contrary to what I originally said!!). Basically I'd like to use a few options from PROC COMPARE (i.e. listing all variables where there is a difference), but without the initial WARNINGs and MESSAGEs (it's just cosmetic, but it may confuse the users of the report). thanks thanks (I'm trying to create a high level report with )

MART1 · ‎01-07-2021

Thanks @Kurt_Bremser all I needed is to know whether there are duplicates, which option briefsummary does. I wouldn't need to know at which observation the duplications occur ; but from what you say this info cannot be excluded from the result. thanks

MART1 · ‎01-07-2021

Hello I'm started using the (very powerful) PROC COMPARE. I'd like to generate a summary output and briefsummary does it nicely; however I would like to remove the WARNING and NOTE appearing at the very beginning of the output (in the yellow box on the screenshot). Is this possible? Many thanks

MART1 · ‎09-03-2020

Thank you so much @Cynthia_sas In my "real" data the App Conn combination changes every day, so I'll definitely have to do it dynamically. I've looked everywhere (I think!) but I can't find an example on how to use CALL EXECUTE on PROC SGPLOT or PROC REPORT - all papers talk about using it using it in a data step. Do you know where I could get any idea? (never used macros so everything is pretty new here). Also, I have run your example, but it's putting everything in one column. The only thing I changed is the ODS "sandwich", where I used LAYOUT instead of HTML (this is because the server where SAS lives and the folder structure I am in are in different environment, so I export to html using the Export function). I don's see why but could this be the reason? Here is the code I copied for you %macro makereport(wantapp=,wantconn=); ods layout gridded columns=2 Height=8cm; ods region; title "TEST"; proc sgplot data=CHAR_TABLE_SORT; xaxis type=discrete display=(nolabel) fitpolicy=thin; yaxis grid; series x=Date y=Records / group=Conn; where app="&wantapp" and conn="&wantconn"; run; title; ods region; proc report data=WORK.char_table_sort nowd; where app ="&wantapp" and conn="&wantconn"; column DATE App Conn cv Mean Median Records ; define DATE / group; define App / group; define Conn / group; define CV / group; define Mean / group; define Median / group; define Records / sum; run; ods layout end; %mend makereport; title; footnote; ods layout start; %makereport(wantapp=APP_A, wantconn=CONN1) %makereport(wantapp=APP_A, wantconn=CONN2) %makereport(wantapp=APP_B, wantconn=CONN1) ods layout end; many thanks

Online Status	Offline
Date Last Visited	‎05-02-2024 08:00 AM

Re: mask commas in a macro variable

mask commas in a macro variable

Re: macro to match multiple records across two datasets

Re: macro to match multiple records across two datasets

macro to match multiple records across two datasets

Re: Custom Report exported to Excel in .srx does not displays tables s...

Re: Custom Report exported to Excel in .srx does not displays tables s...

Re: Custom Report exported to Excel in .srx does not displays tables s...

Re: Custom Report exported to Excel in .srx does not displays tables s...

Re: Custom Report exported to Excel in .srx does not displays tables s...

Re: Identify duplicate records in a dataset without specifying all var...

Re: Return Date in lower case

Re: Proc Report: compute if then

Re: If first. then group by; how to restart count

Select variables using macro

Re: Identify duplicate records in a dataset without specifying all var...

Identify duplicate records in a dataset without specifying all variabl...

Re: looping through variables using %mend macro function

Re: looping through variables using %mend macro function

Re: looping through variables using %mend macro function

Re: looping through variables using %mend macro function

Re: looping through variables using %mend macro function

looping through variables using %mend macro function

Re: PROC COMPARE Summary Output

Re: PROC COMPARE Summary Output

Re: PROC COMPARE Summary Output

Re: PROC COMPARE Summary Output

PROC COMPARE Summary Output

Re: compute after page. line ' ' to change dynamically