<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Linux etc. &#187; Regular expressions</title>
	<atom:link href="http://promberger.info/linux/category/regular-expressions/feed/" rel="self" type="application/rss+xml" />
	<link>http://promberger.info/linux</link>
	<description>my outsourced memory for your perusal</description>
	<lastBuildDate>Thu, 08 Sep 2011 11:06:06 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Use sed to remove the first column in a comma separated file (csv)</title>
		<link>http://promberger.info/linux/2010/08/02/use-sed-to-remove-the-first-column-in-a-comma-separated-file-csv/</link>
		<comments>http://promberger.info/linux/2010/08/02/use-sed-to-remove-the-first-column-in-a-comma-separated-file-csv/#comments</comments>
		<pubDate>Mon, 02 Aug 2010 11:32:41 +0000</pubDate>
		<dc:creator>Marianne</dc:creator>
				<category><![CDATA[Regular expressions]]></category>
		<category><![CDATA[Sed]]></category>

		<guid isPermaLink="false">http://promberger.info/linux/?p=357</guid>
		<description><![CDATA[Here&#8217;s how: sed -i 's/[^,]*,//' file.csv Note the [^,]* bit, which matches everything that is not a comma. Don&#8217;t use .*, because this will greedily match commas, too. (Usually, cut is useful for these sort of things, but cut cannot readily replace the file in place.)]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s how:</p>
<pre>sed -i 's/[^,]*,//' file.csv</pre>
<p>Note the <code>[^,]*</code> bit, which matches everything that is not a comma. Don&#8217;t use <code>.*</code>, because this will greedily match commas, too. (Usually, <code>cut</code> is useful for these sort of things, but cut cannot readily replace the file in place.)</p>
]]></content:encoded>
			<wfw:commentRss>http://promberger.info/linux/2010/08/02/use-sed-to-remove-the-first-column-in-a-comma-separated-file-csv/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Test your regular expressions online</title>
		<link>http://promberger.info/linux/2009/06/17/test-your-regular-expressions-online/</link>
		<comments>http://promberger.info/linux/2009/06/17/test-your-regular-expressions-online/#comments</comments>
		<pubDate>Wed, 17 Jun 2009 19:32:52 +0000</pubDate>
		<dc:creator>Marianne</dc:creator>
				<category><![CDATA[Regular expressions]]></category>

		<guid isPermaLink="false">http://promberger.info/linux/?p=220</guid>
		<description><![CDATA[Speaking of regular expressions, I found out you can test them online at regexpal.com. Neat.]]></description>
			<content:encoded><![CDATA[<p>Speaking of regular expressions, I found out you can test them online at <a href="http://www.regexpal.com/">regexpal.com</a>. Neat.</p>
]]></content:encoded>
			<wfw:commentRss>http://promberger.info/linux/2009/06/17/test-your-regular-expressions-online/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Emacs regular expressions: match at least n occurrences of character class</title>
		<link>http://promberger.info/linux/2009/06/16/emacs-regular-expressions-match-exactly-n-occurrences-of-character-class/</link>
		<comments>http://promberger.info/linux/2009/06/16/emacs-regular-expressions-match-exactly-n-occurrences-of-character-class/#comments</comments>
		<pubDate>Tue, 16 Jun 2009 09:40:49 +0000</pubDate>
		<dc:creator>Marianne</dc:creator>
				<category><![CDATA[Emacs]]></category>
		<category><![CDATA[mutt]]></category>
		<category><![CDATA[Regular expressions]]></category>

		<guid isPermaLink="false">http://promberger.info/linux/?p=213</guid>
		<description><![CDATA[Aside: I just discovered the very useful Emacs regex builder tool. Type M-x regexp-builder. I wanted a regular expression to match the pattern of mutt mail edit buffers, to apply mail-mode, but I did not want to match the muttrc and mutt.hooks files I have. Mail edit buffers get a pattern that starts with &#8220;mutt&#8221;, [...]]]></description>
			<content:encoded><![CDATA[<p>Aside: I just discovered the very useful Emacs regex builder tool. Type <code>M-x regexp-builder</code>.</p>
<p>I wanted a regular expression to match the pattern of mutt mail edit buffers, to apply mail-mode, but I did not want to match the <code>muttrc</code> and <code>mutt.hooks</code> files I have.</p>
<p>Mail edit buffers get a pattern that starts with &#8220;mutt&#8221;, followed by a combination of dashes, letters and numbers. Examples:</p>
<pre>mutt-lauren-ad34AD-
muttadR12
muttadrsd</pre>
<p>The pattern <code>mutt[-0-9a-zA-Z]+$</code> matches these just fine, but it would also match <code>muttrc</code>. So I want a regex that looks for at least three occurrences from the character class described in the brackets. Generally, this is done using <code>{3,}</code> (using the <code>{m,n}</code> pattern to match at least <i>m</i> and at most <i>n</i> occurrences). (You can match exactly <i>n</i> occurrences, by using <code>{3}</code>).</p>
<p>In Emacs, this didn&#8217;t work, and it turns out I had to escape the curly brackets <strong>twice</strong>: <code>mutt[-0-9a-zA-Z]\\{3,\\}$</code>.</p>
<p>Here&#8217;s the full section in my <code>.emacs</code> file:</p>
<pre>(defun mutt-edit-hook ()
  (setq fill-column 70)
  (setq make-backup-files nil)
  )

(add-to-list 'auto-mode-alist '("mutt[-0-9a-zA-Z]\\{3,\\}$" . mail-mode))
(add-hook 'mail-mode-hook 'turn-on-auto-fill)
(add-hook 'mail-mode-hook 'mutt-edit-hook)
</pre>
]]></content:encoded>
			<wfw:commentRss>http://promberger.info/linux/2009/06/16/emacs-regular-expressions-match-exactly-n-occurrences-of-character-class/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Filename pattern matching via bash extended globbing and find regular expressions in shell scripts</title>
		<link>http://promberger.info/linux/2009/01/20/filename-pattern-matching-via-bash-extended-globbing-and-find-regular-expressions-in-shell-scripts/</link>
		<comments>http://promberger.info/linux/2009/01/20/filename-pattern-matching-via-bash-extended-globbing-and-find-regular-expressions-in-shell-scripts/#comments</comments>
		<pubDate>Tue, 20 Jan 2009 12:59:21 +0000</pubDate>
		<dc:creator>Marianne</dc:creator>
				<category><![CDATA[Bash shell]]></category>
		<category><![CDATA[Regular expressions]]></category>

		<guid isPermaLink="false">http://promberger.info/linux/2009/01/20/filename-pattern-matching-via-bash-extended-globbing-and-find-regular-expressions-in-shell-scripts/</guid>
		<description><![CDATA[I was confused that bash would expand some patterns nicely to match filenames when I was on the commandline, but the same pattern would not work from inside a shell script. For example, the following pattern would nicely match all files in the current directory not having an &#8220;Rnw&#8221; or &#8220;tex&#8221; extension on the commandline: [...]]]></description>
			<content:encoded><![CDATA[<p>I was confused that bash would expand some patterns nicely to match filenames when I was on the commandline, but the same pattern would not work from inside a shell script. For example, the following pattern would nicely match all files in the current directory not having an &#8220;Rnw&#8221; or &#8220;tex&#8221; extension on the commandline:</p>
<pre>ls *.!(Rnw|tex)</pre>
<p>Run from a shell script, however, this would give the error:</p>
<pre>line 5: syntax error near unexpected token `('</pre>
<p>It turns out on ubuntu, the shell option <code>extglob</code> is set for an interactive shell, but not for the non-interactive shell that is used when you run a script. You can see whether &#8220;extglob&#8221; is on like this, either from a commandline or by putting the line into your script:</p>
<pre>shopt extglob</pre>
<p> If it&#8217;s off and you want it on, put into your script:</p>
<pre>shopt -s extglob</pre>
<p>An alternative (that you might not want, I&#8217;m not sure what else it might affect) is to change the opening line of your script to read</p>
<pre>#!/bin/bash -i</pre>
<p>This runs your script in an interactive shell and should thus have the same behavior as the commandline. Generally, a neat trick is to use the <code>-x</code> option to debug shell scripts:</p>
<pre>#!/bin/bash -x</pre>
<p>An alternative way to get files whose names match a certain pattern is of course to use <code>find</code>. Here, you can either use limited globbing patterns with the <code>-name</code> option, and combine several patterns using <code>-o</code>, like this:</p>
<pre>find . -name "*.tex" -o -name "*.txt"</pre>
<p>Find also has a <code>-regex</code> option. If you use that, you probably want to use <code>-regex-type=posix-extended</code> to get a behavior that is more similar to extended regular expressions that you might know, for example from sed. Additionally, you have to watch out that <code>find</code> needs the regular expression to match the entire filename including the directory. Here is an example for this:</p>
<pre>audrey:~/tmp/testdir$ ls
blah.tex  blah.txt
audrey:~/tmp/testdir$ find . -regex "blah&#92;.tex"
audrey:~/tmp/testdir$ find . -regex "&#92;./blah&#92;.tex"
./blah.tex</pre>
<p>Here is an example for the difference between the default and the extended regex pattern behavior:</p>
<pre>audrey:~/tmp/testdir$ find . -regex "&#92;./blah&#92;.(tex|txt)"</pre>
<p>finds nothing, but:</p>
<pre>audrey:~/tmp/testdir$ find . -regextype posix-extended -regex "&#92;./blah&#92;.(tex|txt)"
./blah.tex
./blah.txt</pre>
<p>You could alternatively get the desired behavior with the default regex by escaping the grouping parentheses and the &#8220;|&#8221; operator:</p>
<pre>audrey:~/tmp/testdir$ find . -regex "&#92;./blah&#92;.&#92;(tex&#92;|txt&#92;)"
./blah.tex
./blah.txt</pre>
<p>But that starts getting really hard to read.</p>
]]></content:encoded>
			<wfw:commentRss>http://promberger.info/linux/2009/01/20/filename-pattern-matching-via-bash-extended-globbing-and-find-regular-expressions-in-shell-scripts/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Sed example 2</title>
		<link>http://promberger.info/linux/2007/06/25/sed-example-2/</link>
		<comments>http://promberger.info/linux/2007/06/25/sed-example-2/#comments</comments>
		<pubDate>Mon, 25 Jun 2007 15:32:45 +0000</pubDate>
		<dc:creator>Marianne</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[Regular expressions]]></category>
		<category><![CDATA[Sed]]></category>

		<guid isPermaLink="false">http://www.promberger.de/linux/index.php/2007/06/25/sed-example-2/</guid>
		<description><![CDATA[Replacing whitespace, lifted directly from the handy collection of sed one-liners at Sourceforge: Delete all leading whitespace from line (tabs and spaces): sed 's/^[ &#92;t]*//' Delete whitespace at end of line: sed 's/[ &#92;t]*$//' Delete leading and trailing whitespace: sed 's/^[ &#92;t]*//;s/[ &#92;t]*$//']]></description>
			<content:encoded><![CDATA[<p>Replacing whitespace, lifted directly from the handy <a href="http://sed.sourceforge.net/sed1line.txt">collection of sed one-liners at Sourceforge</a>:</p>
<p>Delete all leading whitespace from line (tabs and spaces):</p>
<pre>sed 's/^[ &#92;t]*//'</pre>
<p>Delete whitespace at end of line:</p>
<pre>sed 's/[ &#92;t]*$//'</pre>
<p>Delete leading and trailing whitespace:</p>
<pre>sed 's/^[ &#92;t]*//;s/[ &#92;t]*$//'</pre>
]]></content:encoded>
			<wfw:commentRss>http://promberger.info/linux/2007/06/25/sed-example-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sed example 1</title>
		<link>http://promberger.info/linux/2007/05/01/sed-example-1/</link>
		<comments>http://promberger.info/linux/2007/05/01/sed-example-1/#comments</comments>
		<pubDate>Tue, 01 May 2007 12:51:09 +0000</pubDate>
		<dc:creator>Marianne</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[Regular expressions]]></category>
		<category><![CDATA[Sed]]></category>

		<guid isPermaLink="false">http://www.promberger.de/blog/index.php/2007/05/01/sed-example-1/</guid>
		<description><![CDATA[Many things are easiest to learn by looking at examples. I find this to be especially true for sed. Sed stands for &#8220;stream editor&#8221;, and it is very handy to perform bulk editing of text files. To get started, I refer you to this very nice tutorial. I&#8217;m planning to archive some of the small [...]]]></description>
			<content:encoded><![CDATA[<p>Many things are easiest to learn by looking at examples. I find this to be especially true for sed. Sed stands for &#8220;stream editor&#8221;, and it is very handy to perform bulk editing of text files. To get started, I refer you to this <a href="http://www.grymoire.com/Unix/Sed.html" title="grymoire.com ">very nice tutorial</a>. I&#8217;m planning to archive some of the small editing tasks I did with sed on this blog, both for my own reference and in case anyone wants to look at sed examples. These examples worked when I used them to edit the files I wanted to edit &#8212; but of course no guarantee that there isn&#8217;t a glitch. Always make a backup.</p>
<p>So here&#8217;s the first example. I have a file containing assignments done by students, 100 of them, all one after another in a single text file. I&#8217;ve graded them and want to e-mail them back. I&#8217;ll use sed and mutt to do this conveniently from the commandline.</p>
<p><span id="more-20"></span><br />
The text file looks like this:</p>
<pre class="blackonwhite">studentname@mail.server.com,Firstname Lastname,
The above line is followed by the student's answer to the assignment,
bla bla bla,
several lines of text. At the end, the grade is out of five points:
&gt; 2/5.
secondstudentname@some.mailserver.com,Firstname2 Lastname2,
The first assignment is followed by the next assignment, which again ends in
a grade, and so on for 100 students.
&gt; 4/5.
</pre>
<p>First, I want to get rid of the student name and the two commas on the line with the e-mail address, so that the e-mail address is on a line by itself. The sed command:</p>
<pre>s|&#92;(@.*&#92;),.*,|&#92;1|</pre>
<p>More specifically, this looks for a regexp matching the pattern of an ampersand followed by any sort of stuff ( &#8220;.*&#8221; ), a comma, again anything, followed by another comma, and replaces it by reinserting the first found pattern that we&#8217;ve specified with the two escaped brackets &#8220;\(&#8221; and &#8220;\)&#8221;.<br />
Next, I turn each line with an e-mail address into a mutt command to send anything up to the next &#8220;EOF&#8221; to that e-mail address, with a sensible subject line. The sed command:</p>
<pre>s|&#92;(.*@.*&#92;)|mutt &#92;1 -s "psych 153 a3" &lt;&lt; EOF|</pre>
<p>We insert two newlines and &#8220;EOF&#8221; after each grade. The grade is identified as a pattern of &#8220;/5.&#8221;, and the period needs to be escaped (otherwise, it would be a regexp match for any character):</p>
<pre>s|/5.|/5.&#92;n&#92;nEOF|</pre>
<p>You can put all the above statemens into a file, maybe call it &#8220;sedfile&#8221;, backup your work and then run it with &#8220;in place&#8221; replacement, i.e., the original file gets overwritten:</p>
<pre>cp ass3 ass3.bkp
sed -i -f sedfile ass3</pre>
<p>Finally, make the file executable and run it:</p>
<pre>chmod 700 ass3
./ass3</pre>
]]></content:encoded>
			<wfw:commentRss>http://promberger.info/linux/2007/05/01/sed-example-1/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

