Hi everyone, I am trying to convert a postgresql in proc sas.
I need these main functions available in postgresql least, greatest and regexp_replace.
How do they translate in SAS?
Thank you,
Marco
Its a good idea to review the manual on SAS's implementation of ANSI SQL.
Least is most likely to be min()
Greatest is most likely to be max()
regexp_replace could be a number of different things, for example tranwrd() function. Perl regex is also available in SAS:
https://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf
You might also find it simpler coding if you learn some Base SAS and use that rather than SQL, for instance min/max can be done in proc means, summary, or datastep.
Thank you, I used compress for REGEXP_REPLACE, and I have tried MIN and MAX, however MIN and MAX require numeric inputs, what if I have character strings?
Thank you
Why would you store numbers in character variables? Anyways:
... min(input(char_variable,best.)) as min_result ...
Use the input to convert char to number, and put to convert numeric to char.
I have tried the following
min(input(char_variable1,best.),input(char_variable2,best.)) as min_result
However I have errors. The code I am trying to convert is the following
LEAST(REGEXP_REPLACE(REGEXP_REPLACE(identificativo_veicolo_denuncian,'\*',''),' ',''),REGEXP_REPLACE(REGEXP_REPLACE(identificativo_veicolo_contropar,'\*',''),' ','') ) AS targa1, GREATEST(REGEXP_REPLACE(REGEXP_REPLACE(identificativo_veicolo_denuncian,'\*',''),' ',''),REGEXP_REPLACE(REGEXP_REPLACE(identificativo_veicolo_contropar,'\*',''),' ','') ) AS targa2,
Any help would be greatly appreciated.
Thank you
You are using regex statement to process some character data into a numeric type. I need to see some example of what the data looks like before. So what does
identificativo_veicolo_denuncian
Contain, what does it look like?
It is in this from ' *AA123BB' and with the SAS expression compress(identificativo_veicolo_denuncian,'* ') it becomes
'AA123BB'.
Same applies to identificativo_veicolo_contropar. I need the MIN and MAX function to store in two different variables the MIN and MAX (alphabetically) the two strings.
Thank you
Ok, so:
min(input(compress(identificativo_veicolo_denuncian," ","kd"),best.)) as want
So the compress, I remove blanks, and I use the options k = keep, d = digits, to keep only the numbers in the string, then I convert number and min().
Thank you very much,
the thing is that for each row I need the min of the two variables identificativo_veicolo_contropar and identificativo_veicolo_denuncian. in PostgreSQL this could be done with the instruction LEAST(VAR1,VAR2)
How can I do this?
Can I be achieved as?
min(input(compress(identificativo_veicolo_denuncian," ","kd"),best.),input(compress(identificativo_veicolo_contropar," ","kd"),best.)) as targa1,
Thank you
Yes, min() function is a SAS function which can take a list of numeric values and return the minimum. The min() from SQL min's a column of data, so you can use either and the compiled is clever enough to know which one.
Thank you,
My final query looks like this one
proc sql; CREATE TABLE prova AS SELECT DISTINCT data_sinistro, identificativo_veicolo, min(input(compress(identificativo_veicolo_denuncian," ","kd"),best.),input(compress(identificativo_veicolo_contropar," ","kd"),best.)) as targa1, max(input(compress(identificativo_veicolo_denuncian," ","kd"),best.),input(compress(identificativo_veicolo_contropar," ","kd"),best.)) AS targa2, CASE WHEN ruolo_veicolo_richiesto = 'D' THEN 'D' ELSE 'C' END AS ruolo_veicolo_richiesto, 'NO_CARD' AS tipo FROM Ssrc WHERE calculated targa1 NOT IN ('','ZZ999ZZ', 'ZZ999XX', 'ZZ99999', 'ZZ000ZZ', 'XXXXXX', 'AA00000', 'AA000AA', 'AA000BB', 'AA000XX', 'AA11111', 'AA000ZZ', 'AA111BB', 'XX000XX', 'ELIMINAT', 'XX999XX', 'XX111XX', 'XX123XX', 'MI111111', 'PLPENPLP', 'XX000YY', 'RMZ94732', 'AA111AA', 'AA123BB', 'BS400BZ', 'KASKO', 'RCDIV','VARIE','TRASP','PEDONE','MOTVEL','FURTOT','FURTOP','IGNOTO','INCEND','FURTOR','AGG.TO','ESTERA','FURTOL') AND calculated targa2 NOT IN ('','ZZ999ZZ', 'ZZ999XX', 'ZZ99999', 'ZZ000ZZ', 'XXXXXX', 'AA00000', 'AA000AA', 'AA000BB', 'AA000XX', 'AA11111', 'AA000ZZ', 'AA111BB', 'XX000XX', 'ELIMINAT', 'XX999XX', 'XX111XX', 'XX123XX', 'MI111111', 'PLPENPLP', 'XX000YY', 'RMZ94732', 'AA111AA', 'AA123BB', 'BS400BZ', 'KASKO', 'RCDIV','VARIE','TRASP','PEDONE','MOTVEL','FURTOT','FURTOP','IGNOTO','INCEND','FURTOR','AGG.TO','ESTERA','FURTOL') AND data_sinistro is not missing ; quit;
but, unfortunately it returns the error
ERROR: Expression using IN has components that are of different data types. ERROR: Expression using IN has components that are of different data types.
What am I doing wrong?
You are comparing apples to bananas:
min(input(compress(identificativo_veicolo_denuncian," ","kd"),best.),input(compress(identificativo_veicolo_contropar," ","kd"),best.)) as targa1,
This returns a number which is the minimum of the two converted text fields. It will be a number or missing.
WHERE calculated targa1 NOT IN ('','ZZ999ZZ', 'ZZ999XX', 'ZZ99999', 'ZZ000ZZ', 'XXXXXX', 'AA00000', 'AA000AA', 'AA000BB', 'AA000XX', 'AA11111', 'AA000ZZ', 'AA111BB', 'XX000XX', 'ELIMINAT', 'XX999XX', 'XX111XX', 'XX123XX', 'MI111111', 'PLPENPLP', 'XX000YY', 'RMZ94732', 'AA111AA', 'AA123BB', 'BS400BZ', 'KASKO', 'RCDIV','VARIE','TRASP','PEDONE','MOTVEL','FURTOT','FURTOP','IGNOTO','INCEND','FURTOR','AGG.TO','ESTERA','FURTOL')
This is comparing the numeric value of targa1 as defined previously, with a set of character strings, something that can't be done, and doesn't make sense anyway.
Say you two character variables are:
AA123BB and BA345BB
We calculate targa1 as 123, as remove all the character data, then the minimum is 123 which is < 345. So the numeric variable targa1=123. Then we say is 123 in (characters), which is a fail as you don't compare numbers to characters. It is also a fail as 123 is never going to be in that list of text strings even if convert the number to character again.
What you might want (and again, flying completely blind here):
WHERE identificativo_veicolo_denuncian NOT IN ('','ZZ999ZZ', 'ZZ999XX', 'ZZ99999', 'ZZ000ZZ', 'XXXXXX', 'AA00000', 'AA000AA', 'AA000BB', 'AA000XX', 'AA11111', 'AA000ZZ', 'AA111BB', 'XX000XX', 'ELIMINAT', 'XX999XX', 'XX111XX', 'XX123XX', 'MI111111', 'PLPENPLP', 'XX000YY', 'RMZ94732', 'AA111AA', 'AA123BB', 'BS400BZ', 'KASKO', 'RCDIV','VARIE','TRASP','PEDONE','MOTVEL','FURTOT','FURTOP','IGNOTO','INCEND','FURTOR','AGG.TO','ESTERA','FURTOL')
Thank you very much for your help. I have found a solution that seems to be working as
SELECT DISTINCT data_sinistro, identificativo_veicolo, ifc(compress(identificativo_veicolo_denuncian,"* ")<compress(identificativo_veicolo_contropar,"* "),compress(identificativo_veicolo_denuncian,"* "),compress(identificativo_veicolo_contropar,"* ")) as targa1, ifc(compress(identificativo_veicolo_denuncian,"* ")>compress(identificativo_veicolo_contropar,"* "),compress(identificativo_veicolo_denuncian,"* "),compress(identificativo_veicolo_contropar,"* ")) as targa2, CASE WHEN ruolo_veicolo_richiesto = 'D' THEN 'D' ELSE 'C' END AS ruolo_veicolo_richiesto, 'NO_CARD' AS tipo FROM Ssrc WHERE calculated targa1 NOT IN ('','ZZ999ZZ', 'ZZ999XX', 'ZZ99999', 'ZZ000ZZ', 'XXXXXX', 'AA00000', 'AA000AA', 'AA000BB', 'AA000XX', 'AA11111', 'AA000ZZ', 'AA111BB', 'XX000XX', 'ELIMINAT', 'XX999XX', 'XX111XX', 'XX123XX', 'MI111111', 'PLPENPLP', 'XX000YY', 'RMZ94732', 'AA111AA', 'AA123BB', 'BS400BZ', 'KASKO', 'RCDIV','VARIE','TRASP','PEDONE','MOTVEL','FURTOT','FURTOP','IGNOTO','INCEND','FURTOR','AGG.TO','ESTERA','FURTOL') AND calculated targa2 NOT IN ('','ZZ999ZZ', 'ZZ999XX', 'ZZ99999', 'ZZ000ZZ', 'XXXXXX', 'AA00000', 'AA000AA', 'AA000BB', 'AA000XX', 'AA11111', 'AA000ZZ', 'AA111BB', 'XX000XX', 'ELIMINAT', 'XX999XX', 'XX111XX', 'XX123XX', 'MI111111', 'PLPENPLP', 'XX000YY', 'RMZ94732', 'AA111AA', 'AA123BB', 'BS400BZ', 'KASKO', 'RCDIV','VARIE','TRASP','PEDONE','MOTVEL','FURTOT','FURTOP','IGNOTO','INCEND','FURTOR','AGG.TO','ESTERA','FURTOL') AND data_sinistro is not missing);
Now one last step.
In PostgreSQL I would run the following code:
CREATE TABLE prova AS SELECT DISTINCT ON (data_sinistro, identificativo_veicolo, targa1, targa2) data_sinistro, identificativo_veicolo, targa1, targa2, ruolo_veicolo_richiesto, tipo FROM (****)
Where the **** represents the block of code previously written. This codes eliminates duplicates based on this three fields only
(data_sinistro, identificativo_veicolo, targa1, targa2)
Keeping all the variables
data_sinistro, identificativo_veicolo, targa1, targa2, ruolo_veicolo_richiesto, tipo
In the final table.
Is there something similar in SAS?
Thank you very much indeed for your help
No, they don't in PROC SQL. Try it.
@Marco_park wrote:
Thank you, I used compress for REGEXP_REPLACE, and I have tried MIN and MAX, however MIN and MAX require numeric inputs, what if I have character strings?
Thank you
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.