<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: File Review: check valid values in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/File-Review-check-valid-values/m-p/675150#M203376</link>
    <description>&lt;P&gt;Something like this?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
   select count(*)                                    as NB_RECORDS
         ,count(distinct ID)                          as NB_UNQ_ID
         ,sum(^prxmatch('/^\w\w$/',ID))               as NB_BAD_ID
         ,sum(^input(BDAY,yymmdd8.))                  as NB_BAD_BDAY
         ,sum(^prxmatch('/^[MF]$/i',GENDER))          as NB_BAD_GENDER
         ,sum(^prxmatch('/^[A-Z ]*$/i',CITY))         as NB_BAD_CITY
         ,sum(^prxmatch('/^\d{5} *$/i',ZIP))          as NB_BAD_ZIP
         ,sum(^prxmatch('/^\d{1,4} *$/i',ZIP4))       as NB_BAD_ZIP4
         ,sum(^prxmatch('/^\d{5} *$/i',COUNTY_CODE )) as NB_BAD_CC
   from TABLE;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sun, 09 Aug 2020 22:38:38 GMT</pubDate>
    <dc:creator>ChrisNZ</dc:creator>
    <dc:date>2020-08-09T22:38:38Z</dc:date>
    <item>
      <title>File Review: check valid values</title>
      <link>https://communities.sas.com/t5/SAS-Programming/File-Review-check-valid-values/m-p/674880#M203259</link>
      <description>&lt;P&gt;Hi SAS Community,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am importing pipe delimited .txt and .csv files into SAS and want to run a series of checks:&lt;/P&gt;
&lt;P&gt;1. Count the number of records&lt;/P&gt;
&lt;P&gt;2. Count the number of distinct records for ID&lt;/P&gt;
&lt;P&gt;3. Whether the variables match a list of set variables&lt;/P&gt;
&lt;P&gt;4. Are the values expected or in the correct format&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is there a way to output 1 dataset with all this information? As it currently stands,I produce these results into a text file that are the results of multiple proc frequencies and proc sql.&lt;/P&gt;
&lt;P&gt;These are my expected variables:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data VARS;
input vars $;
datalines;
ID
BDAY
GENDER
ADDRESS1
ADDRESS2
CITY
STATE
ZIP
ZIP4
COUNTY_CODE
;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This is my sample data:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data sample;
infile datalines dlm="|" missover dsd; 
input ID $2 BDAY $10 GENDER $ ADDRESS1 $ ADDRESS2 $10. CITY $ STATE $ ZIP $ ZIP4 $ COUNTY_CODE $;
datalines;
A1|20200420|M|123 Main St.|Suite 201|Juneau|AK|99802||02112
B2|4/20/2020|Male|124 Main St.||Juneau|AK|99802|Juneau
C3|4/20/2020|M|125 Main St.||Juneau|AK|99802||02112
4-1|20200420|Male|126 Main St.|Suite 101|Juneau|AK|99802||Juneau
;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;Conditions:&lt;/P&gt;
&lt;P&gt;ID:should be alpha numeric and 2 characters&lt;/P&gt;
&lt;P&gt;BDAY: should be YYYYMMDD format&lt;/P&gt;
&lt;P&gt;Gender: should be 1 characters&lt;/P&gt;
&lt;P&gt;Address1 and Address 2 should be character&lt;/P&gt;
&lt;P&gt;City should character&lt;/P&gt;
&lt;P&gt;State should be 2 characters&lt;/P&gt;
&lt;P&gt;zip should be 5 characters&lt;/P&gt;
&lt;P&gt;zip4 should be no more than 4 characters&amp;nbsp;&lt;/P&gt;
&lt;P&gt;county should be the fips code which is 5 characters&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Results:&lt;/P&gt;
&lt;P&gt;Line 1: accurate and meets all criteria&lt;/P&gt;
&lt;P&gt;Line 2: dob is not in correct format, gender is not 2 characters, does not have zip4 variable, county does not have 5 character&lt;/P&gt;
&lt;P&gt;Line 3: dob is not in correct format&lt;/P&gt;
&lt;P&gt;Line 4: id is not alpha numeric with no special characters, and gender is not 2 characters&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So far I have:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/*Variables match list? */
		proc sql;
			create table plan_vars as
			select strip(upcase(name)) as vars 
			from sashelp.vcolumn
			where libname='WORK' and memname =SAMPLE;
		quit;


		proc sql;
			create table comparevar_&amp;amp;file. as
			select a.*, b.*,
			case when a.vars = b.varsm then 'Match'
			else 'No' end as var_match
			from VARS as a
			full join plan_vars(rename=(vars=varsm)) as b
			on a.vars=b.varsm;
		quit;

/*output records */
		ods rtf file="&amp;amp;output.\FileReview.rtf";  
		ods noptitle; 
		options nodate nonumber;

		proc sql;
			create table ctrltot_&amp;amp;file. as
			select count(*) as Total_number_of_records,
				count(distinct ID) as Total_number_of_unique_ID,
				count (ID) as tot_ID
			from sample;
		quit;

		proc freq data = sample;
		   	tables _all_;
		   	format _numeric_ _character_ $miss.;
		run;

		proc freq data = ctrltot_&amp;amp;file.;
		tables _all_;
		run;

		proc freq data=comparevar_&amp;amp;file.;
			tables _all_;
			where var_match = 'No';
		run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 06 Aug 2020 15:51:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/File-Review-check-valid-values/m-p/674880#M203259</guid>
      <dc:creator>A_Swoosh</dc:creator>
      <dc:date>2020-08-06T15:51:47Z</dc:date>
    </item>
    <item>
      <title>Re: File Review: check valid values</title>
      <link>https://communities.sas.com/t5/SAS-Programming/File-Review-check-valid-values/m-p/674900#M203264</link>
      <description>&lt;P&gt;&amp;gt; &lt;EM&gt;Line 1: accurate and meets all criteria&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN style="font-family: inherit;"&gt;How?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN style="font-family: inherit;"&gt;ID=A1 is not 7 characters, GENDER=M is not 2 characters etc&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 06 Aug 2020 05:44:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/File-Review-check-valid-values/m-p/674900#M203264</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-08-06T05:44:15Z</dc:date>
    </item>
    <item>
      <title>Re: File Review: check valid values</title>
      <link>https://communities.sas.com/t5/SAS-Programming/File-Review-check-valid-values/m-p/675038#M203327</link>
      <description>&lt;P&gt;Sorry, my apologies. It was a sample and I forgot to change the length. It should be 2 and 1, respectively.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 06 Aug 2020 15:50:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/File-Review-check-valid-values/m-p/675038#M203327</guid>
      <dc:creator>A_Swoosh</dc:creator>
      <dc:date>2020-08-06T15:50:59Z</dc:date>
    </item>
    <item>
      <title>Re: File Review: check valid values</title>
      <link>https://communities.sas.com/t5/SAS-Programming/File-Review-check-valid-values/m-p/675150#M203376</link>
      <description>&lt;P&gt;Something like this?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
   select count(*)                                    as NB_RECORDS
         ,count(distinct ID)                          as NB_UNQ_ID
         ,sum(^prxmatch('/^\w\w$/',ID))               as NB_BAD_ID
         ,sum(^input(BDAY,yymmdd8.))                  as NB_BAD_BDAY
         ,sum(^prxmatch('/^[MF]$/i',GENDER))          as NB_BAD_GENDER
         ,sum(^prxmatch('/^[A-Z ]*$/i',CITY))         as NB_BAD_CITY
         ,sum(^prxmatch('/^\d{5} *$/i',ZIP))          as NB_BAD_ZIP
         ,sum(^prxmatch('/^\d{1,4} *$/i',ZIP4))       as NB_BAD_ZIP4
         ,sum(^prxmatch('/^\d{5} *$/i',COUNTY_CODE )) as NB_BAD_CC
   from TABLE;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 09 Aug 2020 22:38:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/File-Review-check-valid-values/m-p/675150#M203376</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-08-09T22:38:38Z</dc:date>
    </item>
    <item>
      <title>Re: File Review: check valid values</title>
      <link>https://communities.sas.com/t5/SAS-Programming/File-Review-check-valid-values/m-p/675329#M203452</link>
      <description>&lt;P&gt;Yes, this is exactly what I'm trying to accomplish. Thank you for the help.&lt;BR /&gt;I don't have much experience with Perl Regular Expressions but is there a way to find a series of values that is the length of the variable and if it matches then identify as bad? I want to match repeating but instead of n it's the length of the variable? I also want to identify an ID that is not alphanumeric. For example the ID is a description instead of the A235324.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Case 1:&lt;/P&gt;
&lt;P&gt;ID: 0000000.&lt;/P&gt;
&lt;P&gt;Case 2:&lt;/P&gt;
&lt;P&gt;ID: Physician&lt;/P&gt;
&lt;P&gt;Case 3:&lt;/P&gt;
&lt;P&gt;ID: A235324&lt;/P&gt;</description>
      <pubDate>Sat, 08 Aug 2020 00:08:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/File-Review-check-valid-values/m-p/675329#M203452</guid>
      <dc:creator>A_Swoosh</dc:creator>
      <dc:date>2020-08-08T00:08:43Z</dc:date>
    </item>
    <item>
      <title>Re: File Review: check valid values</title>
      <link>https://communities.sas.com/t5/SAS-Programming/File-Review-check-valid-values/m-p/675377#M203480</link>
      <description>How can ID be 00000000 when you define it as $2?</description>
      <pubDate>Sat, 08 Aug 2020 11:25:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/File-Review-check-valid-values/m-p/675377#M203480</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-08-08T11:25:34Z</dc:date>
    </item>
    <item>
      <title>Re: File Review: check valid values</title>
      <link>https://communities.sas.com/t5/SAS-Programming/File-Review-check-valid-values/m-p/675418#M203513</link>
      <description>&lt;P&gt;I was trying to present another example so I'm clear about the syntax involved with perl expressions since I'm new to those expressions.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If I have another dataset where I'm trying to identify&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data sample;
infile datalines dlm="|" missover dsd; 
input CATEGORY $ ID $;
datalines;
Physician|A123242
|0000000
PS220|A123456
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Case 3 has the proper format for each variable while case 2 has both wrong, and case 1 has CATEGORY wrong.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 08 Aug 2020 18:09:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/File-Review-check-valid-values/m-p/675418#M203513</guid>
      <dc:creator>A_Swoosh</dc:creator>
      <dc:date>2020-08-08T18:09:18Z</dc:date>
    </item>
    <item>
      <title>Re: File Review: check valid values</title>
      <link>https://communities.sas.com/t5/SAS-Programming/File-Review-check-valid-values/m-p/675519#M203558</link>
      <description>&lt;P&gt;&amp;gt;&amp;nbsp;&lt;EM&gt;I want to match repeating but instead of n it's the length of the variable&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;prxmatch('/(\d)\1{4}/',STR);&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;matches a digit repeated 4 times (ie 5 identical digits)&lt;/P&gt;
&lt;P&gt;Another way could be testing:&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;repeat(first(STR),4)=STR&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For the rest you can look at the ANYxxxx and NOTxxxx functions&lt;/P&gt;</description>
      <pubDate>Sun, 09 Aug 2020 21:56:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/File-Review-check-valid-values/m-p/675519#M203558</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-08-09T21:56:01Z</dc:date>
    </item>
  </channel>
</rss>

