broken image
The expression works as follows: The first argument (=IIf(FieldName1 April, 2018.The second argument (FieldName) specifies what users see when the condition is true the dates in the field.The third argument ('Date later than 1 April, 2018')) specifies the message that users see when. This regular expression matches 10 digit US Phone numbers in different formats. Some examples are 1)area code in paranthesis. 2)space between different parts of the phone number. 3)no space between different parts of the number. 4)dashes between parts. V Expressions Ltd offers custom modeled kits for the Pearl Mimic and Roland's TD-50, TD-30, TD-25, TD-20X, TD-20, TD-15, TD-12, TD-11, TD-10X, TD-10, TD-9, TD-8 and TD-6 series drum modules. Writing expressions with variables Get 5 of 7 questions to level up! Level up on the above skills and collect up to 400 Mastery points Start quiz. 1-1 Patterns and Expressions Algebra 2. Identifying Patterns Patterns can be represented using words, diagrams, numbers, or algebraic expressions. Algebra 2. What is the next figure? Look at the figures from right to left. What is the pattern? Algebra 2. Try this on your own. Draw the next figure.
Cricut Expression 1
Expression 11
This section under major construction.
Regular expressions. NFA.java,DFS.java,Digraph.java, andGREP.java.
Running time. M = length of expression, N = length of input.Regular expression matching algorithm can create NFA in O(M) timeand simulate input in O(MN) time.
Library implementations.
Validate.java.
Most library implementations of regular expressions use a backtrackingalgorithm that can take an exponential amount of time on someinputs. Such inputs can be surprisingly simple.For example, to determine whether a string of length N is matchedby the regular expression (aaa)*b can take an amount of time exponentialin N if the string is chosen carefully. The table below illustratesjust how spectacularly that the Java 1.4.2 regular expression can fail.
The above examplesare artificial, but they illustrate an alarming defectin most regular expression libraries. Bad inputs do occur in practice.According toCrosbyand Wallach,the following regular expression appears in a versionof SpamAssassin, a powerful spam filtering program.
It attempts to match certain email addresses, but it takes an exponential time to match some strings in manyregular expression libraries including Sun's Java 1.4.2.
This is especially significant because a spammer could use a pathological return emailaddresses to denial-of-service attack a mail server that has SpamAssassin running.This particular pattern is now fixed because Perl 5 regular expresssionsuse an internal cache to short-circuit repeated matches at thesame location during backtracking.
These deficiencies are not limited to Java's implementation.For example, GNU regex-0.12 takes exponential time for matchingstrings of the form aaaaaaaaaaaaaac with the regular expression (a*)*b* .Sun's Java 1.4.2 is equally susceptible to this one.Moreover, Java and Perl regular expressions support back references -the regular expression pattern matching problem for such extended regular expressions is NP hard,so this exponential blowup appears to be inherent on some inputs.
Here's one I actually wrote to try to find the last word before the string NYSE : regexp = '([ws]+).*NYSE';
Reference:Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, .).Compares Thompson NFA with backtracking approach. Contains some performance optimizationsfor Thompson NFA. Also some historical notes and references. Q + A
Q. Documentation on Java regular expression libraries?
A. Here is Oracle's guide to using regular expressions. It includesmany more operationsthat we will not explore.Also see the String methods matches() , split() , and replaceAll() . These are shorthands for using the Pattern and Matcher classes.Here's some commonregular expression patterns.
Q. Industrial grade regular expressions foremail addresses, Java identifiers, integers, decimal, etc.? Cricut Expression 1
A. Here's a library of useful regular expressionsthat offers industrial grade patterns foremail addresses, URLs, numbers, dates, and times.Try this regular expression tool.
Q. I'm confused why does (a b)* match all strings of a's and b's,instead of only string with all a's or string with all b's?
A. The * operator replicates the regular expression (and not a fixed string that matches the regular expression). So the aboveis equivalent to (ab) (ab)(ab) (ab)(ab)(ab) .
Q. History?
A. In the 1940s, Warren McCulloch and Walter Pitts modeled neurons as finite automatato describe the nervous system.In 1956, Steve Kleene invented a mathematical abstraction called regular sets to describe these models. Representation of events in nerve nets and finite automata. in Automata Studies, 3-42, Princeton University Press, Princeton, New Jersey 1956.
Q. Best machines to play at a casino . Any tools for visualizing regular expressions?
A. Try Debuggerx. Exercises
Write a regular expression for each of the following sets of binary strings. Use only the basic operations.
0 or 11 or 101
only 0s
Answers : 0 11 101, 0*
Write a regular expression for each of the following sets of binary strings. Use only the basic operations.
all binary strings
all binary strings except empty string
begins with 1, ends with 1
ends with 00
contains at least three 1s
Answers : (01)*, (01)(01)*, 1 1(01)*1, (01)*00,(01)*1(01)*1(01)*1(01)* or 0*10*10*1(01)*.
Write a regular expression to describe inputs over the alphabeta, b, c that are in sorted order. Answer : a*b*c*.
Write a regular expression for each of the following sets of binary strings. Use only the basic operations.
contains at least three consecutive 1s
contains the substring 110
contains the substring 1101100
doesn't contain the substring 110
Answers : (01)*111(01)*, (01)*110(01)*,(01)*1101100(01)*, (010)*1*.The last one is by far the trickiest.
Write a regular expression for binary stringswith at least two 0s but not consecutive 0s.
Write a regular expression for each of the following sets of binary strings. Use only the basic operations.
has at least 3 characters, and the third character is 0
number of 0s is a multiple of 3
starts and ends with the same character
odd length
starts with 0 and has odd length, or starts with 1and has even length
length is at least 1 and at most 3
Answers :(01)(01)0(01)*, 1* (1*01*01*01*)*, 1(01)*1 0(01)*0 0 1, (01)((01)(01))*, 0((01)(01))* 1(01)((01)(01))*, (01) (01)(01) (01)(01)(01).
For each of the following, indicate how many bit strings oflength exactly 1000 are matched by the regular expression: 0(0 1)*1 , 0*101* , (1 01)* .
Write a regular expression that matches all strings over thealphabet a, b, c that contain:
starts and ends with a
at most one a
at least two a's
an even number of a's
number of a's plus number of b's is even
Find long words whose letters are in alphabetical order, e.g., almost and beefily . Answer : use the regular expression'a*b*c*d*e*f*g*h*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z*$'.
Write a Java regular expression to match phone numbers,with or without area codes. The area codes should be of the form (609) 555-1234 or 555-1234.
Find all English words that end with nym .
Final all English words that contain the trigraph bze . Answer : subzero.
Find all English words that start with g, contain the trigraph pev and end with e. Answer : grapevine.
Find all English words that contain the trigraph spb and have at least two r's.
Find the longest English word that can be written with the top row of a standard keyboard. Answer : proprietorier.
Find all words that contain the four letters a, s, d, and f,not necessarily in that order. Solution : cat words.txt grep a grep s grep d grep f .
Given a string of A, C, T, and G, and X, finda string where X matches any single character, e.g.,CATGG is contained in ACTGGGXXAXGGTTT.
Write a Java regular expression, for use with Validate.java, that validates Social Securitynumbers of the form 123-45-6789. Hint: use d to represent any digit. Answer: [0-9]3-[0-9]2-[0-9]4 .
Modify the previous exercise to make the - optional,so that 123456789 is considered a legal input.
Write a Java regular expression to match all strings that containexactly five vowels and the vowels are in alphabetical order. Answer: [aeiou]*a[aeiou]*e[aeiou]*i[aeiou]*o[aeiou]*u[aeiou]*
Write a Java regular expression to match valid Windows XP file names.Such a file name consists of any sequence of characters otherthan
Additionally, it cannot begin with a space or period.
Write a Java regular expression to match valid OS X file names.Such a file name consists of any sequence of characters other thana colon.Additionally, it cannot begin with a period.
Given a string s that represents the name of an IP address in dotted quad notation, break it up into its constituent pieces, e.g., 255.125.33.222.Make sure that the four fields are numeric.
Write a Java regular expression to describe all dates ofthe form Month DD, YYYY where Month consistsof any string of upper or lower case letters, the date is 1 or 2digits, and the year is exactly 4 digits. The comma and spacesare required.
Write a Java regular expression to describe valid IP addressesof the form a.b.c.d where each letter can represent 1, 2, or 3digits, and the periods are required. Yes: 196.26.155.241.
Write a Java regular expression to match license platesthat start with 4 digits and end with two uppercase letters.
Write a regular expression to extract the coding sequencefrom a DNA string. It starts with the ATG codon and ends with a stop codon (TAA, TAG, or TGA).reference
Write a regular expression to check for the sequencerGATCy: that is, does it start with A or G, then GATC, and thenT or C.
Write a regular expression to check whether a sequence containstwo or more repeats of the the GATA tetranucleotide.
Modify Validate.javato make the searches case insensitive. Hint: use the (?i) embedded flag.
Write a Java regular expression to match various spellingsof Libyan dictator Moammar Gadhafi's last name usingthe folling template: (i) starts with K, G, Q, (ii) optionallyfollowed by H, (iii) followed by AD, (iv) optionally followedby D, (v) optionally followed by H, (vi) optionally followedby AF, (vii) optionally followed by F, (vii) ends with I.
Write a Java program that reads in an expressionlike (KGQ)[H]AD[D][H]AF[F]I and prints out allmatching strings. Here the notation [x] means0 or 1 copy of the letter x .
Why doesn't s.replaceAll('A', 'B'); replace alloccurrences of the letter A with B in the string s?
Answer : Use s = s.replaceAll('A', 'B'); insteadThe method replaceAll returns the resulting string, butdoes not change s itself.Strings are immutable.
Write a program Clean.javathat reads in text from standard input andprints it back out, removing any trailing whitespace on a lineand replacing all tabs with 4 spaces.
Hint: use replaceAll() and the regular expression s for whitespace.
Write a regular expression to match all of the textbetween the text a href =' and the next ' . Answer : href='(.*?)' . The ? makes the .* reluctant instead of greedy.In Java, use Pattern.compile('href='(.*?)', Pattern.CASE_INSENSITIVE) to escape the backslash characters.
Use regular expressions to extract all of the text between the tags title and title .The (?i) is another way to make the match case insensitive.The $2 refers to the second captured subsequence, i.e.,the stuff between the title tags.
Write a regular expression to match all of the textbetween TD . and /TD tags. Answer : TD[]*([]*)/TD Creative Exercises
FMR-1 triplet repeat region. 'The human FMR-1 gene sequence contains a triplet repeat regionin which the sequence CGG or AGG is repeated a number of times. The numberof triplets is highly variable between individuals, and increasedcopy number is associated with fragile X syndrome, a genetic diseasethat causes mental retardation and other symptoms in one out of2000 children.' (Reference: Biological Sequence Analysis byDurbin et al). The pattern is bracket by GCG and CTG,so we get the regular expression GCG (CGG AGG)* CTG.
Ad blocking. Adblock uses regularexpressions to block banner adds under the Mozilla and Firebirdbrowsers.
Parsing text files. A more advanced example where we want to extract specific pieces of thematching input. This program typifies the process of parsing scientific input data.
PROSITE to Java regular expression. Write a program to read in a PROSITE pattern and print outthe corresponding Java regular expression.PROSITE is the 'first and most famous' database of protein families and domains.Its main use it to determine the function of uncharacterized proteins translated from genomic sequences. Biologists usePROSITEpattern syntax rules to search for patterns in biological data.Here is the raw data for CBD FUNGAL(accession code PS00562). Each line contains various information. Perhapsthe most interesting line is the one that begins with PA - it containsthe pattern that describes the protein motif.Such patterns are useful because they often correspond to functional or structural features.
Each uppercase letter corresponds to one amino acid residue.The alphabet consists of uppercase letters corresponding to the 2x amino acids.The - character means concatenation.For example, the pattern above begins with CGG (Cys-Gly-Gly).The notation x plays the role of a wildcard - it matches any amino acid. This corresponds to . in our notation.Parentheses are used to specify repeats: x(2) meansexactly two amino acids, and x(4,7) means between 4 and 7 aminoacids. This corresponds to .2 and .4,7 in Javanotation.Curly braces are used to specify forbidden residues:CG means any residue other than C or G.The asterisk has its usual meaning.
Text to speech synthesis. Original motivation for grep. 'For example, how do you cope with the digraph ui, which is pronounced many different ways: fruit, guile, guilty, anguish, intuit, beguine?'
Challenging regular expressions. Write a regular expression for each of the following sets of binary strings. Use only the basic operations.
any string except 11 or 111
every odd symbol is a 1
contains at least two 0s and at most one 1
no consecutive 1s
Binary divisibility. Write a regular expression for each of the following sets of binary strings. Use only the basic operations.
bit string interpreted as binary number is divisible by 3
bit string interpreted as binary number is divisible by 123
Boston accent. Write a program to replace all of the r's with h's to translatea sentence like 'Park the car in Harvard yard' into the Bostonianversion 'Pahk the cah in Hahvahd yahd'.
File extension. Write a program that takes the name of a file as a command line argumentand prints out its file type extension. The extension is the sequence of characters following the last . .For example the file sun.gif has the extension gif .Hint: use split('.') ; recall that . is a regular expression meta-character, so you need to escape it.
Reverse subdomains. For web log analysis, it is convenient to organize web trafficbased on subdomains like wayne.faculty.cs.princeton.edu .Write a program to read in a domain name and print it outin reverse order like edu.princeton.cs.faculty.wayne .
Bank robbery. You just witnessed a bank robbery and got a partial license plateof the getaway vehicle. It started with ZD , hada 3 somewhere in the middle and ended with V .Help the police officer write regular expression for this plate.
Regular expression for permutations. Find the shortest regular expression (using only the basic operations)you can for the set of all permutations on N elements for N = 5 or 10.For example if N = 3, then the language is abc, acb, bac,bca, cab, cba. Answer : difficult. Solution has length exponentialin N.
Parsing quoted strings. Read in a text file and print out all quote strings.Use a regular expression like '[']*' , but needto worry about escaping the quotation marks.
Parsing HTML. A , optionally followed by whitespace, followed by a ,followed by whitespace, followed by href , optionallyfollowed by whitespace, followed by = , optionally followed by whitespace, followed by 'http:// ,followed by characters until ' , optionally followed bywhitespace, then a .
Subsequence. Given a string s, determine whether it is a subsequence of anotherstring t. For example, abc is a subsequence of achfdbaabgabcaabg.Use a regular expression. Now repeat the process without usingregular expressions.Answer: (a) a.*b.*c.*, (b) use a greedy algorithm.
Huntington's disease diagnostic. The gene that causes Huntington's disease is located on chromosome 4,and has a variable number of repeats of the CAG trinucleotide repeat.Write a program to determine the numberof repeats and print will not develop HD If the number of repeatsis less than 26, offspring at risk if the number is 37-35, at risk if the number is between 36 and 39, and will develop HD if the number is greater than or equal to 40.This is how Huntington's disease is identified in genetic testing.
Gene finder. A gene is a substring of a genome that starts withthe start codon (ATG), end with a stop codon (TAG, TAA, TAG, or TGA)and consists of a sequence of codons (nucleotide triplets)other than the start or stop codons. The gene is thesubstring in between the start and stop codons.
Repeat finder. Write a program Repeat.java that takes two command line arguments,and finds the maximum number of repeats of the first command line argumentin the file specified by the second command line argument.
Character filter. Given a string t of bad characters , e.g. t = '!@$*()-_=+' ,write a function to read in another string s andreturn the result of removing all of the bad characters.
Wildcard pattern matcher. Without using Java's built in regular expressions,write a program Wildcard.javato find all words in the dictionary matching a givenpattern. The special symbol * matches any zero or more characters.So, for example the pattern 'w*ard' matches the word 'ward' and'wildcard'. The special symbol . matches any one character.Your program should read the pattern as a command line parameterand the list of words (separated by whitespace) from standard input.
Wildcard pattern matcher. Repeat the previous exercise, but this time use Java'sbuilt in regular expressions. Warning: in the contextof wildcards, * has a different meaning than with regular expressions.
Search and replace. Word processors allow you to search for all occurrences ofa given query string and replace each with another replacement string.Write a programSearchAndReplace.java thattakes two strings as command line inputs,reads in data from standard input, and replaces all occurrences ofthe first string with the second string, andsends the results to standard output. Hint : use the method String.replaceAll .
Password validator. Suppose that for security reasons you require all passwordsto have at least one of the following charactersWrite a regular expression for use with String.matches that returns true if and only if the password contains one of therequired characters. Answer : '[!@$*]+$'
Alphanumeric filter. Write a program Filter.javato read in text from standard input and eliminate allcharacters that are not whitespace or alpha-numeric. Answer here's the key line.
Converting tabs to spaces. Write a program to convert all tabs in a Java source file to 4 spaces.
Parsing delimited text files. A popular way to store a database is in a text file with one record per line,and each field separated by a special character called the delimiter.Write a program Tokenizer.java that reads in two command line parameters, a delimiter character andthe name of the file, and creates an array of tokens.
Parsing delimited text files. Repeat the previous exercise, but use the String librarymethod split() .
Checking a file format.
Misspellings. Write a Java program to verify that this list ofcommon misspellingsadapted from Wikipediacontains only lines of the form
Google files go for windows 10. where the first word is the misspelling and the string in parentheses isa possible replacement.
Size of DFA is exponential in size of RE. Give a RE for the set of all bitstrings whose kth to the last character equals 1.The size of the RE should be linear in k.Now, give a DFA for the same set of bitstrings.How many states does it use?
Hint : every DFA for this set of bitstrings must have at least2k states.
Last modified on August 27, 2016.
Copyright 20002019Robert SedgewickandKevin Wayne.All rights reserved.
This section under major construction.
Regular expressions. NFA.java,DFS.java,Digraph.java, andGREP.java.
Running time. M = length of expression, N = length of input.Regular expression matching algorithm can create NFA in O(M) timeand simulate input in O(MN) time.
Library implementations.
Validate.java.
Most library implementations of regular expressions use a backtrackingalgorithm that can take an exponential amount of time on someinputs. Such inputs can be surprisingly simple.For example, to determine whether a string of length N is matchedby the regular expression (aaa)*b can take an amount of time exponentialin N if the string is chosen carefully. The table below illustratesjust how spectacularly that the Java 1.4.2 regular expression can fail.
The above examplesare artificial, but they illustrate an alarming defectin most regular expression libraries. Bad inputs do occur in practice.According toCrosbyand Wallach,the following regular expression appears in a versionof SpamAssassin, a powerful spam filtering program.
It attempts to match certain email addresses, but it takes an exponential time to match some strings in manyregular expression libraries including Sun's Java 1.4.2.
This is especially significant because a spammer could use a pathological return emailaddresses to denial-of-service attack a mail server that has SpamAssassin running.This particular pattern is now fixed because Perl 5 regular expresssionsuse an internal cache to short-circuit repeated matches at thesame location during backtracking.
These deficiencies are not limited to Java's implementation.For example, GNU regex-0.12 takes exponential time for matchingstrings of the form aaaaaaaaaaaaaac with the regular expression (a*)*b* .Sun's Java 1.4.2 is equally susceptible to this one.Moreover, Java and Perl regular expressions support back references -the regular expression pattern matching problem for such extended regular expressions is NP hard,so this exponential blowup appears to be inherent on some inputs.
Here's one I actually wrote to try to find the last word before the string NYSE : regexp = '([ws]+).*NYSE';
Reference:Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, .).Compares Thompson NFA with backtracking approach. Contains some performance optimizationsfor Thompson NFA. Also some historical notes and references. Q + A
Artstudio pro 1 1 2 for mac free download . Q. Documentation on Java regular expression libraries?
A. Here is Oracle's guide to using regular expressions. Bartender 3 organize your menu bar icons v3 1 7. It includesmany more operationsthat we will not explore.Also see the String methods matches() , split() , and replaceAll() . These are shorthands for using the Pattern and Matcher classes.Here's some commonregular expression patterns.
Q. Industrial grade regular expressions foremail addresses, Java identifiers, integers, decimal, etc.?
User friendly 3d modeling. A. Here's a library of useful regular expressionsthat offers industrial grade patterns foremail addresses, URLs, numbers, dates, and times.Try this regular expression tool.
Q. I'm confused why does (a b)* match all strings of a's and b's,instead of only string with all a's or string with all b's?
A. The * operator replicates the regular expression (and not a fixed string that matches the regular expression). So the aboveis equivalent to (ab) (ab)(ab) (ab)(ab)(ab) .
Q. History?
A. In the 1940s, Warren McCulloch and Walter Pitts modeled neurons as finite automatato describe the nervous system.In 1956, Steve Kleene invented a mathematical abstraction called regular sets to describe these models. Representation of events in nerve nets and finite automata. in Automata Studies, 3-42, Princeton University Press, Princeton, New Jersey 1956.
Q. Any tools for visualizing regular expressions?
A. Try Debuggerx. Exercises
Write a regular expression for each of the following sets of binary strings. Use only the basic operations.
0 or 11 or 101
only 0s
Answers : 0 11 101, 0*
Write a regular expression for each of the following sets of binary strings. Use only the basic operations.
all binary strings
all binary strings except empty string
begins with 1, ends with 1
ends with 00
contains at least three 1s
Answers : (01)*, (01)(01)*, 1 1(01)*1, (01)*00,(01)*1(01)*1(01)*1(01)* or 0*10*10*1(01)*.
Write a regular expression to describe inputs over the alphabeta, b, c that are in sorted order. Answer : a*b*c*.
Write a regular expression for each of the following sets of binary strings. Use only the basic operations.
contains at least three consecutive 1s
contains the substring 110
contains the substring 1101100
doesn't contain the substring 110
Answers : (01)*111(01)*, (01)*110(01)*,(01)*1101100(01)*, (010)*1*.The last one is by far the trickiest.
Write a regular expression for binary stringswith at least two 0s but not consecutive 0s.
Write a regular expression for each of the following sets of binary strings. Use only the basic operations.
has at least 3 characters, and the third character is 0
number of 0s is a multiple of 3
starts and ends with the same character
odd length
starts with 0 and has odd length, or starts with 1and has even length
length is at least 1 and at most 3
Answers :(01)(01)0(01)*, 1* (1*01*01*01*)*, 1(01)*1 0(01)*0 0 1, (01)((01)(01))*, 0((01)(01))* 1(01)((01)(01))*, (01) (01)(01) (01)(01)(01).
For each of the following, indicate how many bit strings oflength exactly 1000 are matched by the regular expression: 0(0 1)*1 , 0*101* , (1 01)* .
Write a regular expression that matches all strings over thealphabet a, b, c that contain:
starts and ends with a
at most one a
at least two a's
an even number of a's
number of a's plus number of b's is even
Find long words whose letters are in alphabetical order, e.g., almost and beefily . Answer : use the regular expression'a*b*c*d*e*f*g*h*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z*$'.
Write a Java regular expression to match phone numbers,with or without area codes. The area codes should be of the form (609) 555-1234 or 555-1234.
Find all English words that end with nym .
Final all English words that contain the trigraph bze . Answer : subzero.
Find all English words that start with g, contain the trigraph pev and end with e. Answer : grapevine.
Find all English words that contain the trigraph spb and have at least two r's.
Find the longest English word that can be written with the top row of a standard keyboard. Answer : proprietorier.
Find all words that contain the four letters a, s, d, and f,not necessarily in that order. Solution : cat words.txt grep a grep s grep d grep f .
Given a string of A, C, T, and G, and X, finda string where X matches any single character, e.g.,CATGG is contained in ACTGGGXXAXGGTTT.
Write a Java regular expression, for use with Validate.java, that validates Social Securitynumbers of the form 123-45-6789. Hint: use d to represent any digit. Answer: [0-9]3-[0-9]2-[0-9]4 .
Modify the previous exercise to make the - optional,so that 123456789 is considered a legal input.
Write a Java regular expression to match all strings that containexactly five vowels and the vowels are in alphabetical order. Answer: [aeiou]*a[aeiou]*e[aeiou]*i[aeiou]*o[aeiou]*u[aeiou]*
Write a Java regular expression to match valid Windows XP file names.Such a file name consists of any sequence of characters otherthan
Additionally, it cannot begin with a space or period.
Write a Java regular expression to match valid OS X file names.Such a file name consists of any sequence of characters other thana colon.Additionally, it cannot begin with a period.
Given a string s that represents the name of an IP address in dotted quad notation, break it up into its constituent pieces, e.g., 255.125.33.222.Make sure that the four fields are numeric.
Write a Java regular expression to describe all dates ofthe form Month DD, YYYY where Month consistsof any string of upper or lower case letters, the date is 1 or 2digits, and the year is exactly 4 digits. The comma and spacesare required.
Write a Java regular expression to describe valid IP addressesof the form a.b.c.d where each letter can represent 1, 2, or 3digits, and the periods are required. Yes: 196.26.155.241.
Write a Java regular expression to match license platesthat start with 4 digits and end with two uppercase letters.
Write a regular expression to extract the coding sequencefrom a DNA string. It starts with the ATG codon and ends with a stop codon (TAA, TAG, or TGA).reference
Write a regular expression to check for the sequencerGATCy: that is, does it start with A or G, then GATC, and thenT or C.
Write a regular expression to check whether a sequence containstwo or more repeats of the the GATA tetranucleotide.
Modify Validate.javato make the searches case insensitive. Hint: use the (?i) embedded flag.
Write a Java regular expression to match various spellingsof Libyan dictator Moammar Gadhafi's last name usingthe folling template: (i) starts with K, G, Q, (ii) optionallyfollowed by H, (iii) followed by AD, (iv) optionally followedby D, (v) optionally followed by H, (vi) optionally followedby AF, (vii) optionally followed by F, (vii) ends with I.
Write a Java program that reads in an expressionlike (KGQ)[H]AD[D][H]AF[F]I and prints out allmatching strings. Here the notation [x] means0 or 1 copy of the letter x .
Why doesn't s.replaceAll('A', 'B'); replace alloccurrences of the letter A with B in the string s?
Answer : Use s = s.replaceAll('A', 'B'); insteadThe method replaceAll returns the resulting string, butdoes not change s itself.Strings are immutable.
Write a program Clean.javathat reads in text from standard input andprints it back out, removing any trailing whitespace on a lineand replacing all tabs with 4 spaces.
Hint: use replaceAll() and the regular expression s for whitespace.
Write a regular expression to match all of the textbetween the text a href =' and the next ' . Answer : href='(.*?)' . The ? makes the .* reluctant instead of greedy.In Java, use Pattern.compile('href='(.*?)', Pattern.CASE_INSENSITIVE) to escape the backslash characters.
Use regular expressions to extract all of the text between the tags title and title .The (?i) is another way to make the match case insensitive.The $2 refers to the second captured subsequence, i.e.,the stuff between the title tags.
Write a regular expression to match all of the textbetween TD . and /TD tags. Answer : TD[]*([]*)/TD Creative Exercises
FMR-1 triplet repeat region. 'The human FMR-1 gene sequence contains a triplet repeat regionin which the sequence CGG or AGG is repeated a number of times. The numberof triplets is highly variable between individuals, and increasedcopy number is associated with fragile X syndrome, a genetic diseasethat causes mental retardation and other symptoms in one out of2000 children.' (Reference: Biological Sequence Analysis byDurbin et al). The pattern is bracket by GCG and CTG,so we get the regular expression GCG (CGG AGG)* CTG.
Ad blocking. Adblock uses regularexpressions to block banner adds under the Mozilla and Firebirdbrowsers.
Parsing text files. A more advanced example where we want to extract specific pieces of thematching input. This program typifies the process of parsing scientific input data.
PROSITE to Java regular expression. Write a program to read in a PROSITE pattern and print outthe corresponding Java regular expression.PROSITE is the 'first and most famous' database of protein families and domains.Its main use it to determine the function of uncharacterized proteins translated from genomic sequences. Biologists usePROSITEpattern syntax rules to search for patterns in biological data.Here is the raw data for CBD FUNGAL(accession code PS00562). Each line contains various information. Perhapsthe most interesting line is the one that begins with PA - it containsthe pattern that describes the protein motif.Such patterns are useful because they often correspond to functional or structural features.
Each uppercase letter corresponds to one amino acid residue.The alphabet consists of uppercase letters corresponding to the 2x amino acids.The - character means concatenation.For example, the pattern above begins with CGG (Cys-Gly-Gly).The notation x plays the role of a wildcard - it matches any amino acid. This corresponds to . in our notation.Parentheses are used to specify repeats: x(2) meansexactly two amino acids, and x(4,7) means between 4 and 7 aminoacids. This corresponds to .2 and .4,7 in Javanotation.Curly braces are used to specify forbidden residues:CG means any residue other than C or G.The asterisk has its usual meaning.
Text to speech synthesis. Original motivation for grep. 'For example, how do you cope with the digraph ui, which is pronounced many different ways: fruit, guile, guilty, anguish, intuit, beguine?'
Challenging regular expressions. Write a regular expression for each of the following sets of binary strings. Use only the basic operations.
any string except 11 or 111
every odd symbol is a 1
contains at least two 0s and at most one 1
no consecutive 1s
Binary divisibility. Write a regular expression for each of the following sets of binary strings. Use only the basic operations.
bit string interpreted as binary number is divisible by 3
bit string interpreted as binary number is divisible by 123
Boston accent. Write a program to replace all of the r's with h's to translatea sentence like 'Park the car in Harvard yard' into the Bostonianversion 'Pahk the cah in Hahvahd yahd'.
File extension. Write a program that takes the name of a file as a command line argumentand prints out its file type extension. The extension is the sequence of characters following the last . .For example the file sun.gif has the extension gif .Hint: use split('.') ; recall that . is a regular expression meta-character, so you need to escape it.
Reverse subdomains. For web log analysis, it is convenient to organize web trafficbased on subdomains like wayne.faculty.cs.princeton.edu .Write a program to read in a domain name and print it outin reverse order like edu.princeton.cs.faculty.wayne .
Bank robbery. You just witnessed a bank robbery and got a partial license plateof the getaway vehicle. It started with ZD , hada 3 somewhere in the middle and ended with V .Help the police officer write regular expression for this plate.
Regular expression for permutations. Find the shortest regular expression (using only the basic operations)you can for the set of all permutations on N elements for N = 5 or 10.For example if N = 3, then the language is abc, acb, bac,bca, cab, cba. Answer : difficult. Solution has length exponentialin N.
Parsing quoted strings. Read in a text file and print out all quote strings.Use a regular expression like '[']*' , but needto worry about escaping the quotation marks.
Parsing HTML. A , optionally followed by whitespace, followed by a ,followed by whitespace, followed by href , optionallyfollowed by whitespace, followed by = , optionally followed by whitespace, followed by 'http:// ,followed by characters until ' , optionally followed bywhitespace, then a .
Subsequence. Given a string s, determine whether it is a subsequence of anotherstring t. For example, abc is a subsequence of achfdbaabgabcaabg.Use a regular expression. Now repeat the process without usingregular expressions.Answer: (a) a.*b.*c.*, (b) use a greedy algorithm.
Huntington's disease diagnostic. The gene that causes Huntington's disease is located on chromosome 4,and has a variable number of repeats of the CAG trinucleotide repeat.Write a program to determine the numberof repeats and print will not develop HD If the number of repeatsis less than 26, offspring at risk if the number is 37-35, at risk if the number is between 36 and 39, and will develop HD if the number is greater than or equal to 40.This is how Huntington's disease is identified in genetic testing.
Gene finder. A gene is a substring of a genome that starts withthe start codon (ATG), end with a stop codon (TAG, TAA, TAG, or TGA)and consists of a sequence of codons (nucleotide triplets)other than the start or stop codons. The gene is thesubstring in between the start and stop codons.
Repeat finder. Write a program Repeat.java that takes two command line arguments,and finds the maximum number of repeats of the first command line argumentin the file specified by the second command line argument.
Character filter. Given a string t of bad characters , e.g. t = '!@$*()-_=+' ,write a function to read in another string s andreturn the result of removing all of the bad characters.
Wildcard pattern matcher. Without using Java's built in regular expressions,write a program Wildcard.javato find all words in the dictionary matching a givenpattern. The special symbol * matches any zero or more characters.So, for example the pattern 'w*ard' matches the word 'ward' and'wildcard'. The special symbol . matches any one character.Your program should read the pattern as a command line parameterand the list of words (separated by whitespace) from standard input.
Wildcard pattern matcher. Repeat the previous exercise, but this time use Java'sbuilt in regular expressions. Warning: in the contextof wildcards, * has a different meaning than with regular expressions.
Search and replace. Word processors allow you to search for all occurrences ofa given query string and replace each with another replacement string.Write a programSearchAndReplace.java thattakes two strings as command line inputs,reads in data from standard input, and replaces all occurrences ofthe first string with the second string, andsends the results to standard output. Hint : use the method String.replaceAll .
Password validator. Suppose that for security reasons you require all passwordsto have at least one of the following charactersWrite a regular expression for use with String.matches that returns true if and only if the password contains one of therequired characters. Answer : '[!@$*]+$'
Alphanumeric filter. Write a program Filter.javato read in text from standard input and eliminate allcharacters that are not whitespace or alpha-numeric. Answer here's the key line.
Converting tabs to spaces. Write a program to convert all tabs in a Java source file to 4 spaces.
Parsing delimited text files. A popular way to store a database is in a text file with one record per line,and each field separated by a special character called the delimiter.Write a program Tokenizer.java that reads in two command line parameters, a delimiter character andthe name of the file, and creates an array of tokens.
Parsing delimited text files. Repeat the previous exercise, but use the String librarymethod split() .
Checking a file format.
Misspellings. Write a Java program to verify that this list ofcommon misspellingsadapted from Wikipediacontains only lines of the form
where the first word is the misspelling and the string in parentheses isa possible replacement.
Size of DFA is exponential in size of RE. Give a RE for the set of all bitstrings whose kth to the last character equals 1.The size of the RE should be linear in k.Now, give a DFA for the same set of bitstrings.How many states does it use?
Hint : every DFA for this set of bitstrings must have at least2k states.
Expression 11
Last modified on August 27, 2016.
Copyright 20002019Robert SedgewickandKevin Wayne.All rights reserved.
broken image