learnbyexample

Festive offers for books on Python, Linux, Regular Expressions, Vim and more!

2025-11-21T00:00:00+00:00

Hello!

Here are some awesome deals for programming books and courses during the 2025 festive season.

My ebooks🔗

Festive offers for my ebooks till 30-November-2025:

All 13 Books Bundle — $16 (normal price $36), learn Regular Expressions, Linux CLI tools, Python, Vim and more!
Understanding Python re(gex)? — FREE (normal price $10)

Other deals🔗

Python related deals
The Pragmatic Bookshelf — 50% off
Deals on Django and Git books/software
Huge list of awesome deals — tools, productivity, books, courses, etc
blackfridaydeals.dev — Hottest Black Friday Deals for Developers

Happy learning :)

Connect Four game with a twist

2025-08-20T00:00:00+00:00

From wikipedia: Connect Four:

Connect Four is a game in which the players choose a color and then take turns dropping colored tokens into a six-row, seven-column vertically suspended grid. The pieces fall straight down, occupying the lowest available space within the column. The objective of the game is to be the first to form a horizontal, vertical, or diagonal line of four of one's own tokens.

As a twist, this TUI implementation also offers two more variations of the game:

form a square, i.e. four cells forming 90 degree angles and equidistant from each other
form a line or square

Installation🔗

This app is available on PyPI as connectsquare. Example installation instructions are shown below, adjust them based on your preferences and OS.

# virtual environment
$ python3 -m venv textual_apps
$ cd textual_apps
$ source bin/activate
$ pip install connectsquare

# launch the app
$ connectsquare

To run the app without having to enter the virtual environment again, add this alias to .bashrc (or equivalent):

# you'll have to change the path
alias connectsquare='/path/to/textual_apps/bin/connectsquare'

As an alternative to manually managing such virtual environments, you can use uv or pipx instead.

As yet another alternative, you can install textual==0.85.2 (see Textual documentation for more details), clone this repository and run the connect_square.py file.

Screenshots🔗

Adjust your terminal's dimension for the game widgets to appear properly, for example 80x30 (characters x lines). Sample screenshots are shown below:

Guide🔗

Press the n key to start a new game. Existing game, if any, will be abandoned
You can choose between Connect Four, Connect Square (default) and Both types of game
You can choose between Easy (default), Medium and Hard difficulty modes:
- In the Easy mode, the AI will make a random move
- In the Medium mode, the AI will make a random move based on certain weight calculations
- In the Hard mode, the AI will make the best move based on the weight calculations (the algorithm is based only on the current board state and thus it is not impossible for the user to win)
The first move is based on the User first (default) and AI first choices
Only the bottom most empty cell of each column will be considered as a valid move
Press the t key to toggle between light and dark themes
Press the q key to quit the app

User moves are denoted by the ⭕️ character and AI moves are denoted by the ✖️ character.

The text panel under the game board displays the current status of the game. If the game ends with one of the players forming a valid line or square, the cells forming the winning move will be highlighted.

Square tic tac toe🔗

If you liked this game, you might also enjoy Square tic tac toe.

If you are interested in learning more about the AI algorithm for the Connect Square game, check out my explanation here for Square tic tac toe — while there are a few differences between the two, the foundation is the same.

Python regular expression cheatsheet and examples

2025-06-09T00:00:00+00:00

This blog post gives an overview and examples of regular expression syntax as implemented by the re built-in module (Python 3.13+). Assume ASCII character set unless otherwise specified. This post is an excerpt from my Understanding Python re(gex)? book.

Visualization created using debuggex for the pattern r'\bpar(en|ro)?t\b'

From docs.python: re:

A regular expression (or RE) specifies a set of strings that matches it; the functions in this module let you check if a particular string matches a given regular expression

Elements that define a regular expression🔗

Anchors	Description
`\A`	restricts the match to the start of string
`\Z`	restricts the match to the end of string
`^`	restricts the match to the start of line
`$`	restricts the match to the end of line
`\n`	newline character is used as the line separator
`re.MULTILINE` or `re.M`	flag to treat input as multiline string
`\b`	restricts the match to the start/end of words
	word characters: alphabets, digits, underscore
`\B`	matches wherever `\b` doesn't match

^, $ and \ are metacharacters in the above table, as these characters have special meaning. Prefix a \ character to remove the special meaning and match such characters literally. For example, \^ will match a ^ character instead of acting as an anchor.

Feature	Description
`\|`	multiple RE combined as conditional OR
	each alternative can have independent anchors
`(pat)`	group patterns, also a capturing group
	`a(b\|c)d` is same as `abd\|acd`
`(?:pat)`	non-capturing group
`(?P<name>pat)`	named capture group
`.`	Match any character except the newline character `\n`
`[]`	Character class, matches one character among many

Greedy Quantifiers	Description
`*`	Match zero or more times
`+`	Match one or more times
`?`	Match zero or one times
`{m,n}`	Match `m` to `n` times (inclusive)
`{m,}`	Match at least `m` times
`{,n}`	Match up to `n` times (including `0` times)
`{n}`	Match exactly `n` times
`pat1.*pat2`	any number of characters between `pat1` and `pat2`
`pat1.pat2\|pat2.pat1`	match both `pat1` and `pat2` in any order

Greedy here means that the above quantifiers will match as much as possible that'll also honor the overall RE. Appending a ? to greedy quantifiers makes them non-greedy, i.e. match as minimally as possible. Appending a + to greedy quantifiers makes them possessive, which prevents backtracking. You can also use (?>pat) atomic grouping to safeguard from backtracking. Quantifiers can be applied to literal characters, groups, backreferences and character classes.

Character class	Description
`[aeiou]`	Match any vowel
`[^aeiou]`	`^` inverts selection, so this matches any consonant
`[a-f]`	`-` defines a range, so this matches any of abcdef characters
`\d`	Match a digit, same as `[0-9]`
`\D`	Match non-digits, same as `[^0-9]` or `[^\d]`
`\w`	Match word characters, same as `[a-zA-Z0-9_]`
`\W`	Match non-word characters, same as `[^a-zA-Z0-9_]` or `[^\w]`
`\s`	Match whitespace characters, same as `[\ \t\n\r\f\v]`
`\S`	Match non-whitespace characters, same as `[^\ \t\n\r\f\v]` or `[^\s]`

Lookarounds	Description
lookarounds	custom assertions, zero-width like anchors
`(?!pat)`	negative lookahead assertion
`(?<!pat)`	negative lookbehind assertion
`(?=pat)`	positive lookahead assertion
`(?<=pat)`	positive lookbehind assertion
`(?!pat1)(?=pat2)`	multiple assertions can be specified in any order
	as they mark a matching location without consuming characters
`((?!pat).)*`	Negate a grouping, similar to negated character class

Flags	Description
`re.IGNORECASE` or `re.I`	flag to ignore case
`re.DOTALL` or `re.S`	allow `.` metacharacter to match newline characters
`flags=re.S\|re.I`	multiple flags can be combined using `\|` operator
`re.MULTILINE` or `re.M`	allow `^` and `$` anchors to match line wise
`re.VERBOSE` or `re.X`	allows to use literal whitespaces for aligning purposes
	and to add comments after the `#` character
	escape spaces and `#` if needed as part of actual RE
`re.ASCII` or `re.A`	match only ASCII characters for `\b`, `\w`, `\d`, `\s`
	and their opposites, applicable only for Unicode patterns
`re.LOCALE` or `re.L`	use locale settings for byte patterns and 8-bit locales
`(?#comment)`	another way to add comments (not a flag)
`(?flags:pat)`	inline flags only for this `pat`, overrides `flags` argument
	flags is `i` for `re.I`, `s` for `re.S`, etc, except `L` for `re.L`
`(?-flags:pat)`	negate flags only for this `pat`
`(?flags-flags:pat)`	apply and negate particular flags only for this `pat`
`(?flags)`	apply flags for whole RE, can be used only at start of RE
	anchors if any, should be specified after `(?flags)`

Matched portion	Description
`re.Match` object	details like matched portions, location, etc
`m[0]` or `m.group(0)`	entire matched portion of `re.Match` object `m`
`m[n]` or `m.group(n)`	matched portion of the nth capture group
`m.groups()`	tuple of all the capture groups' matched portions
`m.span()`	start and end+1 index of the entire matched portion
	pass a number to get span of that particular capture group
	can also use `m.start()` and `m.end()`
`\N`	backreference, gives matched portion of the Nth capture group
	applies to both search and replacement sections
	possible values: `\1`, `\2` up to `\99` provided no more digits
`\g<N>`	backreference, gives matched portion of the Nth capture group
	possible values: `\g<0>`, `\g<1>`, etc (not limited to 99)
	`\g<0>` refers to the entire matched portion
`(?P<name>pat)`	named capture group
	refer as `'name'` in `re.Match` object
	refer as `(?P=name)` in search section
	refer as `\g<name>` in replacement section
`groupdict`	method applied on a `re.Match` object
	gives named capture group portions as a `dict`

\0 and \100 onwards are considered as octal values, hence cannot be used as backreferences.

re module functions🔗

Function	Description
`re.search`	Check if given pattern is present anywhere in input string
	Output is a `re.Match` object, usable in conditional expressions
	r-strings preferred to define RE
	Use byte pattern for byte input
	Python also maintains a small cache of recent RE
`re.fullmatch`	ensures pattern matches the entire input string
`re.compile`	Compile a pattern for reuse, outputs `re.Pattern` object
`re.sub`	search and replace
`re.sub(r'pat', f, s)`	function `f` with `re.Match` object as the argument
`re.escape`	automatically escape all metacharacters
`re.split`	split a string based on RE
	text matched by the groups will be part of the output
	portion matched by pattern outside group won't be in output
`re.findall`	returns all the matches as a list
	if 1 capture group is used, only its matches are returned
	1+, each element will be tuple of capture groups
	portion matched by pattern outside group won't be in output
`re.finditer`	iterator with `re.Match` object for each match
`re.subn`	gives tuple of modified string and number of substitutions

The function definitions are given below:

re.search(pattern, string, flags=0)
re.fullmatch(pattern, string, flags=0)
re.compile(pattern, flags=0)
re.sub(pattern, repl, string, count=0, flags=0)
re.escape(pattern)
re.split(pattern, string, maxsplit=0, flags=0)
re.findall(pattern, string, flags=0)
re.finditer(pattern, string, flags=0)
re.subn(pattern, repl, string, count=0, flags=0)

Regular expression examples🔗

As a good practice, always use raw strings to construct RE, unless other formats are required. This will avoid conflict between special meaning of the backslash character in RE and string literals.

I wrote an interactive TUI app to help you experiment with the examples presented below. See PyRegexPlayground repo for installation instructions and usage guide. See PyRegexExercises repo for a TUI app with 100+ Python regex exercises.

examples for re.search()

>>> sentence = 'This is a sample string'

# need to load the re module before use
>>> import re
# check if 'sentence' contains the pattern described by RE argument
>>> bool(re.search(r'is', sentence))
True

# ignore case while searching for a match
>>> bool(re.search(r'this', sentence, flags=re.I))
True

# example for a pattern not found in the input string
>>> bool(re.search(r'xyz', sentence))
False

# re.search output can be directly used in conditional expressions
>>> if re.search(r'ring', sentence):
...     print('mission success')
... 
mission success

# use raw byte strings for patterns if input is of byte data type
>>> bool(re.search(rb'is', b'This is a sample string'))
True

string and line anchors

# match the start of the input string
>>> bool(re.search(r'\Ahi', 'hi hello\ntop spot'))
True

# match the start of a line
>>> bool(re.search(r'^top', 'hi hello\ntop spot', flags=re.M))
True

# match the end of strings
>>> words = ['surrender', 'up', 'newer', 'do', 'era', 'eel', 'pest']
>>> [w for w in words if re.search(r'er\Z', w)]
['surrender', 'newer']

# check if there's a whole line 'par'
>>> bool(re.search(r'^par$', 'spare\npar\ndare', flags=re.M))
True

examples for re.findall()

# match 'par' with optional 's' at start and optional 'e' at end
>>> re.findall(r'\bs?pare?\b', 'par spar apparent spare part pare')
['par', 'spar', 'spare', 'pare']

# numbers >= 100 with optional leading zeros
# you'd need r'\b0*[1-9]\d{2,}\b' if possessive quantifiers isn't used
>>> re.findall(r'\b0*+\d{3,}\b', '0501 035 154 12 26 98234')
['0501', '154', '98234']

# if multiple capturing groups are used, each element of output
# will be a tuple of strings of all the capture groups
>>> re.findall(r'([^/]+)/([^/,]+),?', '2020/04,1986/Mar')
[('2020', '04'), ('1986', 'Mar')]

# normal capture group will hinder ability to get the whole match
# non-capturing group to the rescue
>>> re.findall(r'\b\w*(?:st|in)\b', 'cost akin more east run')
['cost', 'akin', 'east']

# useful for debugging purposes as well
>>> re.findall(r':.*?:', 'green:3.14:teal::brown:oh!:blue')
[':3.14:', '::', ':oh!:']

examples for re.split()

# split based on one or more digit characters
>>> re.split(r'\d+', 'Sample123string42with777numbers')
['Sample', 'string', 'with', 'numbers']

# split based on digit or whitespace characters
>>> re.split(r'[\d\s]+', '**1\f2\n3star\t7 77\r**')
['**', 'star', '**']

# to include the matching delimiter strings as well in the output
>>> re.split(r'(\d+)', 'Sample123string42with777numbers')
['Sample', '123', 'string', '42', 'with', '777', 'numbers']

# multiple capture groups example
# note that the portion matched by b+ isn't present in the output
>>> re.split(r'(a+)b+(c+)', '3.14aabccc42')
['3.14', 'aa', 'ccc', '42']

# use non-capturing group if capturing is not needed
>>> re.split(r'hand(?:y|ful)', '123handed42handy777handful500')
['123handed42', '777', '500']

backreferencing within the search pattern

>>> words = ['effort', 'flee', 'facade', 'oddball', 'rat', 'tool']

# whole words that have at least one consecutive repeated character
>>> [w for w in words if re.search(r'\b\w*(\w)\1\w*\b', w)]
['effort', 'flee', 'oddball', 'tool']

working with matched portions

# re.Match object
>>> re.search(r'so+n', 'too soon a song snatch')
<re.Match object; span=(4, 8), match='soon'>

# retrieving the entire matched portion, note the use of [0]
>>> motivation = 'Doing is often better than thinking of doing.'
>>> re.search(r'of.*ink', motivation)[0]
'often better than think'

# capture group example
>>> purchase = 'coffee:100g tea:250g sugar:75g chocolate:50g'
>>> m = re.search(r':(.*?)g.*?:(.*?)g.*?chocolate:(.*?)g', purchase)
# to get the matched portion of the second capture group
>>> m[2]
'250'

# to get a tuple of all the capture groups
>>> m.groups()
('100', '250', '50')

examples for re.finditer()

# numbers < 350
>>> m_iter = re.finditer(r'\d+', '45 349 651 593 4 204 350')
>>> [m[0] for m in m_iter if int(m[0]) < 350]
['45', '349', '4', '204']

# start and end+1 index of each matching portion
>>> m_iter = re.finditer(r'so+n', 'song too soon snatch')
>>> for m in m_iter:
...     print(m.span())
... 
(0, 3)
(9, 13)

examples for re.sub()

# add something to the start of every line
>>> ip_lines = "catapults\nconcatenate\ncat"
>>> print(re.sub(r'^', '* ', ip_lines, flags=re.M))
* catapults
* concatenate
* cat

# replace 'par' only at the start of a word
>>> re.sub(r'\bpar', 'X', 'par spar apparent spare part')
'X spar apparent spare Xt'

# same as: r'part|parrot|parent'
>>> re.sub(r'par(en|ro)?t', 'X', 'par part parrot parent')
'par X X X'

# remove first two columns where : is delimiter
>>> re.sub(r'\A([^:]+:){2}', '', 'apple:123:banana:cherry')
'banana:cherry'

backreferencing in the replacement section

# remove any number of consecutive duplicate words separated by space
# use \W+ instead of space to cover cases like 'a;a<-;a'
>>> re.sub(r'\b(\w+)( \1)+\b', r'\1', 'aa a a a 42 f_1 f_1 f_13.14')
'aa a 42 f_1 f_13.14'

# add something around the matched strings
>>> re.sub(r'\d+', r'(\g<0>0)', '52 apples and 31 mangoes')
'(520) apples and (310) mangoes'

# swap words that are separated by a comma
>>> re.sub(r'(\w+),(\w+)', r'\2,\1', 'good,bad 42,24')
'bad,good 24,42'

# example with both capturing and non-capturing groups
>>> re.sub(r'(\d+)(?:abc)+(\d+)', r'\2:\1', '1000abcabc42 12abcd21')
'42:1000 12abcd21'

using functions in the replacement section of re.sub()

>>> from math import factorial
>>> numbers = '1 2 3 4 5'
>>> def fact_num(n):
...     return str(factorial(int(n[0])))
... 
>>> re.sub(r'\d+', fact_num, numbers)
'1 2 6 24 120'

# using lambda
>>> re.sub(r'\d+', lambda m: str(factorial(int(m[0]))), numbers)
'1 2 6 24 120'

examples for lookarounds

# change 'cat' only if it is not followed by a digit character
# note that the end of string satisfies the given assertion
# 'catcat' has two matches as the assertion doesn't consume characters
>>> re.sub(r'cat(?!\d)', 'dog', 'hey cats! cat42 cat_5 catcat')
'hey dogs! cat42 dog_5 dogdog'

# change whole word only if it is not preceded by : or -
>>> re.sub(r'(?<![:-])\b\w+', 'X', ':cart <apple -rest ;tea')
':cart <X -rest ;X'

# extract digits only if it is preceded by - and followed by ; or :
>>> re.findall(r'(?<=-)\d+(?=[:;])', '42 apple-5, fig3; x-83, y-20: f12')
['20']

# words containing 'b' and 'e' and 't' in any order
>>> words = ['sequoia', 'questionable', 'exhibit', 'equation']
>>> [w for w in words if re.search(r'(?=.*b)(?=.*e).*t', w)]
['questionable', 'exhibit']

# match if 'do' is not there between 'at' and 'par'
>>> bool(re.search(r'at((?!do).)*par', 'fox,cat,dog,parrot'))
False
# match if 'go' is not there between 'at' and 'par'
>>> bool(re.search(r'at((?!go).)*par', 'fox,cat,dog,parrot'))
True

examples for re.compile()

Regular expressions can be compiled using the re.compile() function, which gives back a re.Pattern object. The top level re module functions are all available as methods for this object. Compiling a regular expression helps if the RE has to be used in multiple places or called upon multiple times inside a loop (speed benefit). By default, Python maintains a small list of recently used RE, so the speed benefit doesn't apply for trivial use cases.

>>> pet = re.compile(r'dog')
>>> type(pet)
<class 're.Pattern'>
>>> bool(pet.search('They bought a dog'))
True
>>> bool(pet.search('A cat crossed their path'))
False

>>> pat = re.compile(r'\([^)]*\)')
>>> pat.sub('', 'a+b(addition) - foo() + c%d(#modulo)')
'a+b - foo + c%d'
>>> pat.sub('', 'Hi there(greeting). Nice day(a(b)')
'Hi there. Nice day'

Understanding Python re(gex)? book🔗

Visit my GitHub repo Understanding Python re(gex)? for details about the book I wrote on Python regular expressions. The book uses plenty of examples to explain the concepts from the basics and introduces more advanced concepts step-by-step. The book also covers the third-party regex module. The cheatsheet and examples presented in this post are based on the contents of this book.

You can get all my ebooks as a single bundle via leanpub or gumroad.

Better bindings for command line history search

2025-05-21T00:00:00+00:00

Do you find it confusing to use Ctrl+r for searching a command from shell history? I have the following mappings in the ~/.inputrc file, so that I can use the ↑ and ↓ arrow keys instead. Note that this assumes you are using Emacs-style key bindings instead of Vi mode.

"\e[A": history-search-backward
"\e[B": history-search-forward

Normally, when you use the ↑ arrow key, you'll get the previous command from the history. You can repeatedly press the key to get more entries. With the above settings active, this behavior will change if you have some characters already typed before pressing the arrow key — you'll only get entries from the history that match these characters from the start of the command. If there are multiple matches, you can use the ↑ and ↓ keys repeatedly to move backwards and forwards through the list.

If you want the matching to happen anywhere in command (same behavior as Ctrl+r and Ctrl+s), use the following lines instead:

"\e[A": history-substring-search-backward
"\e[B": history-substring-search-forward

The ~/.inputrc file affects any shell using the readline library (for example, programming language REPLs). You can use the bind command and put them in, ~/.bashrc for example, to affect only that particular shell.

bind '"\e[A": history-search-backward'
bind '"\e[B": history-search-forward'

Customizing pandoc to generate beautiful pdf and epub from markdown

2025-05-19T00:00:00+00:00

Either you've already heard of pandoc or if you have searched online for markdown to pdf or similar, you are sure to come across pandoc. This tutorial will help you use pandoc to generate pdf and epub versions from a GitHub style markdown file.

The main motivation for this blog post is to highlight the customizations I used for self-publishing my ebooks. It wasn't easy to arrive at the setup I ended up with, so I hope this will be useful for those looking to use pandoc for such a purpose. This guide is specifically aimed at technical books that has code snippets.

Poster created using Canva

Installation🔗

If you use a debian based distro like Ubuntu, the below steps are enough for the demos in this tutorial. If you get an error or warning, search that issue online and you'll likely find what else has to be installed.

I first downloaded deb file from pandoc: releases and installed it. Followed by packages needed for pdf generation.

# latest pandoc version as of 19 May 2025
$ sudo gdebi ~/Downloads/pandoc-3.7.0.1-1-amd64.deb

# note that download size is hundreds of MB
$ sudo apt install texlive-xetex
$ sudo apt install librsvg2-bin
$ sudo apt install texlive-science

For more details and instructions for other operating systems, refer to pandoc: installation.

Minimal example🔗

Once pandoc is working on your system, try generating a sample pdf without any customization.

See learnbyexample.github.io repo for all the input and output files referred in this tutorial.

$ pandoc sample_1.md -f gfm -o sample_1.pdf

Here sample_1.md is input markdown file and -f is used to specify that the input format is GitHub style markdown. The -o option specifies the output file type based on extension. The default output is probably good enough. But I wished to customize hyperlinks, inline code style, add page breaks between chapters, etc. This blog post will discuss these customizations one by one.

pandoc has its own flavor of markdown with many useful extensions — see pandoc: pandocs-markdown for details. GitHub style markdown is recommended if you wish to use the same source (or with minor changes) in multiple places.

It is advised to use markdown headers in order without skipping — for example, H1 for chapter heading and H2 for chapter sub-section, etc is fine. H1 for chapter heading and H3 for sub-section is not. Using the former can give automatic index navigation on ebook readers.

On Evince reader, the index navigation for above sample looks like this:

Chapter breaks🔗

As observed from the previous demo, there are no chapter breaks by default. Searching for a solution online, I got this piece of tex code:

\usepackage{sectsty}
\sectionfont{\clearpage}

This can be added using the -H option. From pandoc manual:

-H FILE, --include-in-header=FILE

Include contents of FILE, verbatim, at the end of the header. This can be used, for example, to include special CSS or JavaScript in HTML documents. This option can be used repeatedly to include multiple files in the header. They will be included in the order specified. Implies --standalone.

The pandoc invocation now looks like:

$ pandoc sample_1.md -f gfm -H chapter_break.tex -o sample_1_chapter_break.pdf

You can add further customization to headings, for example:

\sectionfont{\underline\clearpage} to underline chapter names
\sectionfont{\LARGE\clearpage} to allow chapter names to get even bigger

Here are some more links to read about various customizations:

Changing settings via -V option🔗

-V KEY[=VAL], --variable=KEY[:VAL]

Set the template variable KEY to the value VAL when rendering the document in standalone mode. This is generally only useful when the --template option is used to specify a custom template, since pandoc automatically sets the variables used in the default templates. If no VAL is specified, the key will be given the value true.

The -V option allows to change variable values to customize settings like page size, font, link color, etc. As more settings are changed, better to use a simple script to call pandoc instead of typing the whole command on the terminal.

#!/bin/bash

pandoc "$1" \
    -f gfm \
    --include-in-header chapter_break.tex \
    -V linkcolor:blue \
    -V geometry:a4paper \
    -V geometry:margin=2cm \
    -V mainfont="DejaVu Serif" \
    -V monofont="DejaVu Sans Mono" \
    --pdf-engine=xelatex \
    -o "$2"

mainfont is for normal text
monofont is for code snippets
geometry is for page size and margins
linkcolor will set the color for internal links
- this will also colorize other types of links
- set urlcolor if you want to distinguish URLs and so on for other types
to increase the default font size, use -V fontsize=12pt
- See stackoverflow: change font size if you need even bigger size options

Using xelatex as the pdf-engine helps to use any font installed in your system. One reason I chose DejaVu was because it supported Greek and other Unicode characters that were causing error with other fonts. See tex.stackexchange: Using XeLaTeX instead of pdfLaTeX for some more details.

The pandoc invocation is now through a script:

$ chmod +x md2pdf.sh
$ ./md2pdf.sh sample_1.md sample_1_settings.pdf

Do compare the pdf generated side by side with previous output before proceeding.

On my system, DejaVu Serif did not have italic variation installed, so I had to use sudo apt install ttf-dejavu-extra to get it.

Syntax highlighting🔗

One option to customize syntax highlighting for code snippets is to save one of the pandoc themes and editing it. See stackoverflow: What are the available syntax highlighters? for available themes and more details (as a good practice on stackoverflow, go through all answers and comments — the linked/related sections on sidebar are useful as well).

$ pandoc --print-highlight-style=pygments > pygments.theme

Edit the above file to customize the theme. Use sites like colorhexa to help with color choices, hex values, etc. For this demo, the below settings are changed:

# by default, background is same as normal text
# change it to a shade of gray to easily distinguish code and text
"background-color": "#f8f8f8",

# change italic to false, messes up comments with slashes
# change comment text-color to yet another shade of gray
"Comment": {
    "text-color": "#9c9c9c",
    "background-color": null,
    "bold": false,
    "italic": false,
    "underline": false
},

Inline code

Similar to changing background color for code snippets, I found a solution online for inline code snippets as well.

\usepackage{fancyvrb,newverbs,xcolor}

\definecolor{Light}{HTML}{F4F4F4}

\let\oldtexttt\texttt
\renewcommand{\texttt}[1]{
  \colorbox{Light}{\oldtexttt{#1}}
}

Add --highlight-style pygments.theme and --include-in-header inline_code.tex to the script and generate the pdf again.

With pandoc sample_2.md -f gfm -o sample_2.pdf the output would be:

With ./md2pdf_syn.sh sample_2.md sample_2_syn.pdf the output is:

For my Understanding Python re(gex)? book, by chance I found that using ruby instead of python for REPL code snippets syntax highlighting was better. Snapshot from ./md2pdf_syn.sh sample_3.md sample_3.pdf result is shown below. For python directive, string output gets treated as a comment and color for boolean values isn't easy to distinguish from string values. The ruby directive treats string value as expected and boolean values are easier to spot.

Bullet styling🔗

This stackoverflow Q&A helped for bullet styling.

\usepackage{enumitem}
\usepackage{amsfonts}

% level one
\setlist[itemize,1]{label=$\bullet$}
% level two
\setlist[itemize,2]{label=$\circ$}
% level three
\setlist[itemize,3]{label=$\star$}

Comparing pandoc sample_4.md -f gfm -o sample_4.pdf vs ./md2pdf_syn_bullet.sh sample_4.md sample_4_bullet.pdf gives:

PDF properties🔗

This tex.stackexchange Q&A helped to change metadata. See also pspdfkit: What’s Hiding in Your PDF? and discussion on HN.

\usepackage{hyperref}

\hypersetup{
  pdftitle={My awesome book},
  pdfauthor={learnbyexample},
  pdfsubject={pandoc},
  pdfkeywords={pandoc,pdf,xelatex}
}

./md2pdf_syn_bullet_prop.sh sample_4.md sample_4_bullet_prop.pdf gives:

Adding table of contents🔗

There's a handy option --toc to automatically include table of contents at top of the generated pdf. You can control number of levels using --toc-depth option, the default is 3 levels. You can also change the default string Contents to something else using the -V toc-title option.

./md2pdf_syn_bullet_prop_toc.sh sample_1.md sample_1_toc.pdf gives:

Adding cover image🔗

To add something prior to table of contents, cover image for example, you can use a tex file and include it verbatim. Create a tex file (named as cover.tex here) with content as shown below:

\includegraphics{cover.png}
\thispagestyle{empty}

Then, modify the previous script md2pdf_syn_bullet_prop_toc.sh by adding --include-before-body cover.tex and tada — you get the cover image before table of contents. \thispagestyle{empty} helps to avoid page number on the cover page, see also tex.stackexchange: clear page.

The bash script invocation is now ./md2pdf_syn_bullet_prop_toc_cover.sh sample_5.md sample_5.pdf.

You'll need at least one image in input markdown file, otherwise settings won't apply to the cover image and you may end up with a weird output. sample_5.md used in the command above includes an image. And be careful to use escapes if the image path can contain tex metacharacters.

Stylish blockquote🔗

By default, blockquotes (lines starting with > in markdown) are just indented in the pdf output. To make them standout, tex.stackexchange: change the background color and border of blockquote helped.

Create quote.tex with the contents as shown below. You can change the colors to suit your own preferred style.

\usepackage{tcolorbox}
\newtcolorbox{myquote}{colback=red!5!white, colframe=red!75!black}
\renewenvironment{quote}{\begin{myquote}}{\end{myquote}}

The bash script invocation is now ./md2pdf_syn_bullet_prop_toc_cover_quote.sh sample_5.md sample_5_quote.pdf. The difference between default and styled blockquote is shown below.

Customizing epub🔗

For a long time, I thought epub didn't make sense for programming books. Turned out, I wasn't using the right ebook readers. FBReader was good for novels but not ebooks with code snippets. When I used atril, foliate or calibre ebook-viewer, the results were good.

I didn't know how to use css before trying to generate the epub version. Somehow, I managed to take the default epub.css provided by pandoc and customize it as close as possible to the pdf version. The modified epub.css is available from the learnbyexample.github.io repo. The bash script to generate the epub is shown below and invoked as ./md2epub.sh sample_5.md sample_5.epub. Note that pygments.theme is same as the pdf customization discussed before.

#!/bin/bash

pandoc  "$1" \
        -f gfm \
        --toc \
        --standalone \
        --top-level-division=chapter \
        --highlight-style pygments.theme \
        --css epub.css \
        --metadata=title:"My awesome book" \
        --metadata=author:"learnbyexample" \
        --metadata=lang:"en-US" \
        --metadata=cover-image:"cover.png" \
        -o "$2"

Resource links🔗

More options and workflows for generating ebooks:

pandoc-latex-template — a clean pandoc LaTeX template to convert your markdown files to PDF or LaTeX
Writing a book with pandoc, make, and vim
Quarto — open source scientific and technical publishing system built on Pandoc
quarkdown — a modern Markdown-based typetting system
typst — a new markup-based typesetting system that is powerful and easy to learn
Jupyter Book — open source project for building beautiful, publication-quality books and documents from computational material
- See also fastdoc — the output of fastdoc is an asciidoc file for each input notebook. You can then use asciidoctor to convert that to HTML, DocBook, epub, mobi, and so forth
Mau — template-based markup language, heavily inspired by AsciiDoc
Asciidoctor
- Asciidoc book template
- pdf generation workflow with Asciidoc
Sphinx
- Self-publishing a book with reStructuredText, Sphinx, Calibre, and vim
Bookdown
Emacs orgmode
Markdeep

Miscellaneous

Everything you need to know about sed substitution

2025-05-07T00:00:00+00:00

The command name sed is derived from stream editor. The most commonly used editing command is substitution, for which various examples are shown in this blog post.

The examples presented here have been tested with GNU sed. Syntax and features might differ for other implementations.

Basic Substitution🔗

The substitute command syntax is s/REGEXP/REPLACEMENT/FLAGS where:

s stands for the substitute command
/ is an idiomatic delimiter character to separate various portions of the command
REGEXP is the regular expression that defines the search portion
REPLACEMENT refers to the replacement string
FLAGS are options to change the default behavior of the command

# for each input line, change only the first ',' to '-'
$ printf '1,2,3,4\na,b,c,d\n' | sed 's/,/-/'
1-2,3,4
a-b,c,d

# you can change all the matches by adding the 'g' flag
$ printf '1,2,3,4\na,b,c,d\n' | sed 's/,/=-=/g'
1=-=2=-=3=-=4
a=-=b=-=c=-=d

Filter and Substitute🔗

You can use line numbers, regular expressions or a combination of them to select lines and then apply a command to these filtered lines. See the Selective editing chapter from my ebook to learn more about the various kinds of addressing.

# make changes only to the first line
$ printf '1,2,3,4\na,b,c,d\n' | sed '1 s/,/::/g'
1::2::3::4
a,b,c,d

# apply substitution only if the input line does NOT contain '2'
$ printf '1,2,3,4\na,b,c,d\n' | sed '/2/! s/,/-/g'
1,2,3,4
a-b-c-d

Regular Expressions🔗

Only a handful of examples are shown here. See this chapter from my ebook for a more detailed discussion. See my blog post for a cheatsheet.

Anchors:

# lines starting with 'par'
$ printf 'car par\nparty\nspare\n' | sed 's/^par/tas/'
car par
tasty
spare

# words starting with 'par'
$ printf 'car par\nparty\nspare\n' | sed 's/\bpar/1234/'
car 1234
1234ty
spare

Alternation and Grouping:

# same as: sed -E 's/part|parrot|parent/X/g'
# -E option enables ERE (default is BRE)
$ echo 'par part parrot parent' | sed -E 's/par(en|ro)?t/X/g'
par X X X

Character Class and Quantifiers:

# numbers >= 100 with optional leading zeros
$ echo '0501 035 154 12 26 98234' | sed -E 's/\b0*[1-9][0-9]{2,}\b/X/g'
X 035 X 12 26 X

# retain only punctuation characters
$ echo ',pie tie#ink-eat_42' | sed -E 's/[^[:punct:]]+//g'
,#-_

Backreferences:

# remove two or more duplicate words separated by spaces
# \b prevents false matches like 'the theatre', 'sand and stone' etc
$ echo 'aa a a a 42 f_1 f_1 f_13.14' | sed -E 's/\b(\w+)( \1)+\b/\1/g'
aa a 42 f_1 f_13.14

# whole words that have at least one consecutive repeated character
$ echo 'effort flee facade oddball rat tool' | sed -E 's/\w*(\w)\1\w*/X/g'
X X facade X rat X

# match lowercase followed by underscore followed by lowercase
# delete the underscore and convert the 2nd lowercase to uppercase
$ echo '_fig aug_price next_line' | sed -E 's/([a-z])_([a-z])/\1\u\2/g'
_fig augPrice nextLine

Replace Specific Occurrences🔗

You can use a number as a flag to replace only that particular occurrence of the search term. If you combine this with the g flag, all occurrences after that particular match will also be replaced.

$ s='apple:banana:cherry:fig:mango'

# replace only the second occurrence
$ echo "$s" | sed -E 's/[^:]+/"&"/2'
apple:"banana":cherry:fig:mango

# replace all matches except the first occurrence
$ echo "$s" | sed -E 's/:/---/2g'
apple:banana---cherry---fig---mango

With the help of capture groups and backreferences, you can replace a specific occurrence from the end of the input line.

$ s='car,art,pot,map,urn,ray,ear'

# replace the last occurrence
$ echo "$s" | sed -E 's/(.*),/\1[]/'
car,art,pot,map,urn,ray[]ear

# generic version, where {N} refers to last but Nth occurrence
$ echo "$s" | sed -E 's/(.*),((.*,){3})/\1[]\2/'
car,art,pot[]map,urn,ray,ear

Executing External Commands🔗

The e flag helps to insert the output of a shell command within sed.

# replace the entire line with the output of a shell command
$ printf 'apple\nreplace this line\n' | sed 's/^replace.*/date/e'
apple
Wednesday 07 May 2025 10:25:34 AM IST

# after substitution, the command that gets executed is 'seq 3'
$ echo 'xyz 3' | sed 's/xyz/seq/e'
1
2
3

Different Delimiters🔗

The / character is idiomatically used as the REGEXP delimiter. But any character other than \ and the newline character can be used instead.

# instead of this
$ echo '/home/learnbyexample/reports' | sed 's/\/home\/learnbyexample\//~\//'
~/reports

# use a different delimiter
$ echo '/home/learnbyexample/reports' | sed 's#/home/learnbyexample/#~/#'
~/reports

In-place Editing🔗

The -i option is helpful to write back the changes to the original files itself. If you don't provide an argument to this option, backup of the original file won't be created.

$ cat colors.txt
deep blue
light orange
blue delight

# no output on terminal as the -i option is used
$ sed -i.bkp 's/blue/green/' colors.txt
# output from sed is written back to 'colors.txt'
$ cat colors.txt
deep green
light orange
green delight

# original file is preserved in 'colors.txt.bkp'
$ cat colors.txt.bkp
deep blue
light orange
blue delight

* in the argument to the -i option will be replaced with the input filename. So, -i'bkp.*' for f1.txt will create bkp.f1.txt as the backup. And if you use old/*, the backups will be under the same name but under the directory old (provided that directory already exists).

Manipulating Newlines🔗

By default, sed reads the input line by line (with \n considered as the line ending). The newline character, if present, is removed and then added back when the pattern space is printed. Which implies that you cannot directly manipulate the newline character, unless you use features that results in more than one line in the pattern space.

# append the next line to the pattern space
# and then replace newline character with a colon character
$ seq 7 | sed 'N; s/\n/:/'
1:2
3:4
5:6
7

# if line contains 'at', the next line gets appended to the pattern space
# then the substitution is performed on the two lines in the buffer
$ printf 'gates\nnot\nused\n' | sed '/at/{N; s/s\nnot/d/}'
gated
used

Slurping Input🔗

If the input doesn't have NUL characters, then the -z option is handy to process the entire input as a single string. This is effective only for files small enough to fit the available machine memory. It would also depend on the regular expression, as some patterns have exponential relationship with respect to the data size.

# add ; to the previous line if the current line starts with c
$ printf 'cater\ndog\ncoat\ncutter\nmat\n' | sed -z 's/\nc/;&/g'
cater
dog;
coat;
cutter
mat

Fixed String Substitution🔗

Typically, you'd need to escape \, & and the delimiter for the string used in the replacement section. For the search section, the characters to be escaped will depend upon whether you are using BRE or ERE.

# replacement string
$ r='a/b&c\d'
$ r=$(printf '%s' "$r" | sed 's#[\&/]#\\&#g')

# ERE version for the search string
$ s='{[(\ta^b/d).*+?^$|]}'
$ s=$(printf '%s' "$s" | sed 's#[{[()^$*?+.\|/]#\\&#g')
$ printf '%s\n' 'f*{[(\ta^b/d).*+?^$|]} - 3' | sed -E 's/'"$s"'/'"$r"'/g'
f*a/b&c\d - 3

# BRE version for the search string
$ s='{[(\ta^b/d).*+?^$|]}'
$ s=$(printf '%s' "$s" | sed 's#[[^$*.\/]#\\&#g')
$ printf '%s\n' 'f*{[(\ta^b/d).*+?^$|]} - 3' | sed 's/'"$s"'/'"$r"'/g'
f*a/b&c\d - 3

See my blog post for multiline fixed string substitution examples.

Programming ebooks🔗

Check out my ebooks on Regular Expressions, Linux CLI tools, Python and Vim. You can get them all as a single bundle via leanpub or gumroad.

CLI text processing with GNU awk book announcement

2025-03-26T00:00:00+00:00

Hello!

I am pleased to announce a new version of my CLI text processing with GNU awk ebook.

Learn the GNU awk command step-by-step from beginner to advanced levels with hundreds of examples and exercises. This book will dive deep into field processing, show examples for filtering features, multiple file processing, how to construct solutions that depend on multiple records, how to compare records and fields between two or more files, how to identify duplicates while maintaining input order and so on. Regular expressions will also be discussed in detail.

Release offers🔗

To celebrate the new release, you can download the PDF/EPUB versions of CLI text processing with GNU awk for FREE till 06-April-2025. You can still pay if you wish ;)

Here are some more amazing offers:

All 13 books bundle is $18 (normal price $36) — Leanpub or Gumroad
Linux CLI Text Processing is $10 (normal price $20) — Leanpub or Gumroad

What's new?🔗

Command version updated to GNU awk 5.3.1
Added details for the --csv option and the \u escape sequence
Corrected typos, updated exercises, descriptions and external links
Updated Acknowledgements section

Videos🔗

Check out my programming tips covering Python, command line tools and Vim:

Interactive TUI app🔗

I also wrote an interactive TUI app based on some of the exercises from the ebook. Reference solutions are also provided.

Table of Contents🔗

Preface
Installation and Documentation
awk introduction
Regular Expressions
Field separators
Record separators
In-place file editing
Using shell variables
Control Structures
Built-in functions
Multiple file input
Processing multiple records
Two file processing
Dealing with duplicates
awk scripts
Gotchas and Tips
Further Reading

Web version🔗

You can read the book online here: https://learnbyexample.github.io/learn_gnuawk/

GitHub repo🔗

Visit https://github.com/learnbyexample/learn_gnuawk for markdown source, example files, exercise solutions, sample chapters and other details related to the book.

See also my blog post on how to customize pandoc for generating beautiful PDF/EPUB versions from GitHub style markdown.

Subscribe to learnbyexample weekly — free newsletter covering programming resources, updates on what I am creating, tools, free ebooks and more, delivered every Friday.

Feedback and Errata🔗

I would highly appreciate it if you'd let me know how you felt about this book. It could be anything from a simple thank you, Gumroad rating, pointing out a typo, mistakes in code snippets, which aspects of the book worked for you (or didn't!) and so on. Reader feedback is essential and especially so for self-published authors.

You can reach me via:

Issue Manager: https://github.com/learnbyexample/learn_gnuawk/issues
E-mail: echo 'bGVhcm5ieWV4YW1wbGUubmV0QGdtYWlsLmNvbQo=' | base64 --decode
Twitter: https://twitter.com/learn_byexample

Happy learning :)

awk idioms explained

2025-02-18T00:00:00+00:00

Do you find awk one-liners cryptic? Stuff like !a[$0]++, 1, $1=$1, NR==FNR and -v RS=? You'll find examples and brief explanations for such idioms in this post.

The examples presented here have been tested with GNU awk. These are likely to work with most other implementations of awk as well.

awk command structure🔗

awk 'cond1{action1} cond2{action2} ... condN{actionN}'

When a conditional expression isn't provided, the action is always executed. When an action isn't provided, the $0 variable (which has the contents of the current record being processed) is printed if the conditional expression evaluates to true.

Regexp filtering🔗

# same as: grep 'at' and sed -n '/at/p'
$ printf 'gate\napple\nwhat\nkite\n' | awk '/at/'
gate
what

# same as: grep -v 'e' and sed -n '/e/!p'
$ printf 'gate\napple\nwhat\nkite\n' | awk '!/e/'
what

The generic syntax is string ~ /regexp/ to check if the given string matches the regexp and string !~ /regexp/ to invert the condition.

/regexp/ is a shortcut for $0 ~ /regexp/{print $0}
!/regexp/ is a shortcut for $0 !~ /regexp/{print $0}

Idiomatic use of 1🔗

Non-zero numeric values and non-empty strings are truthy (zero and empty strings are falsy). Idiomatically, 1 is used as a conditional expression to print the contents of $0.

$ echo 'ring amazing jar' | awk '{sub(/ing/, "ed", $2)} 1'
ring amazed jar

$ seq 2 | awk 'BEGIN{print "---"} 1; END{print "==="}'
---
1
2
===

Special variables🔗

$0 contains the current record being processed
$1 first field
$2 second field and so on
FS input field separator
OFS output field separator
NF number of fields
RS input record separator
ORS output record separator
NR number of records (i.e. line number) for the entire input
FNR number of records per file

Removing duplicates🔗

awk '!a[$0]++' is one of the most famous awk one-liners. It eliminates line based duplicates while retaining the input order.

$ cat purchases.txt
coffee
tea
washing powder
coffee
tea
coffee milkshake
soap
tea
washing soda

$ awk '{print +a[$0] "\t" $0; a[$0]++}' purchases.txt
0	coffee
0	tea
0	washing powder
1	coffee
1	tea
0	coffee milkshake
0	soap
2	tea
0	washing soda

# only the entries with zero in the first column will be retained
$ awk '!a[$0]++' purchases.txt
coffee
tea
washing powder
coffee milkshake
soap
washing soda

a[$0] creates an uninitialized element in array a with $0 as the key (if the key doesn't exist yet). Thus, !a[$0] will succeed only on the first occurrence of an item (since an uninitialized value is falsy) and the post-increment operator will ensure that further instances of an item will fail the conditional expression.

Rebuild $0🔗

Sometimes you just want to change the field separator, or perform some record-level text processing and then print it with a new field separator. In such cases, you'll have to explicitly fake a field operation — otherwise the field separation update won't happen for $0.

$ s='sample123string42with777numbers'

$ echo "$s" | awk -F'[0-9]+' -v OFS=, '{$1=$1} 1'
sample,string,with,numbers

$ echo "$s" | awk -F'[0-9]+' -v OFS=- '{gsub(/[aeiou]/, ""); $1=$1} 1'
smpl-strng-wth-nmbrs

Paragraph mode🔗

When RS is set to an empty string, one or more consecutive empty lines is used as the input record separator.

$ cat para.txt
hello world

hi there
how are you

just doing
believe it

banana
papaya
mango

much ado about nothing
he he he
adios amigo

# uninitialized variable 's' will be empty for the first match
# afterwards, 's' will provide the empty line separation
$ awk -v RS= '/do/{print s $0; s="\n"}' para.txt
just doing
believe it

much ado about nothing
he he he
adios amigo

Two file processing🔗

For two files as input, NR==FNR will be true only when the first file is being processed. The next statement will skip the rest of the code for the current record.

$ cat marks.txt
dept    name    marks
ece     raj     53
ece     joel    72
eee     moi     68
cse     surya   81
eee     tia     59
ece     om      92
cse     amy     67

$ cat dept_mark.txt
ece 70
eee 65
cse 80

# match dept and minimum marks specified in dept_mark.txt
$ awk 'NR==FNR{d[$1]=$2; next}
       $1 in d && $3 >= d[$1]' dept_mark.txt marks.txt
ece     joel    72
eee     moi     68
cse     surya   81
ece     om      92

Note that the NR==FNR logic will fail if the first file is empty, since NR wouldn't get a chance to increment. You can set a flag after the first file has been processed to avoid this issue — for example, awk '!f{a[$0]; next} !($0 in a)' file1 f=1 file2. See this unix.stackexchange thread for more workarounds.

Forcing string and numeric context🔗

Strings are automatically converted to a number when used in an arithmetic expression (for example, "42" + 5). You can use the unary + and - operators to force numeric context. If the string doesn't start with a valid number (ignoring any starting whitespaces), it will be treated as 0.

$ seq 3 | awk '{sum += $0} END{print sum}'
6
$ awk '{sum += $0} END{print sum}' /dev/null

$ awk '{sum += $0} END{print +sum}' /dev/null
0

Similarly, you can concatenate a string to a number to force string context.

$ awk 'BEGIN{n1="5.0"; n2=5; if(n1==n2) print "equal"}'
$ awk 'BEGIN{n1="5.0"; n2=5; if(n1==n2".0") print "equal"}'
equal

See gawk manual: How awk Converts Between Strings and Numbers for more details.

Programming ebooks🔗

Check out my ebooks on Regular Expressions, Linux CLI tools, Python and Vim. You can get them all as a single bundle via leanpub or gumroad.

OS installation woes

2025-02-12T00:00:00+00:00

I prefer stability and thus I let my Linux LTS distributions to last almost till the entire end of support time. Having learned from previous transition periods, I started maintaining notes on what softwares I install/purge and some of the customization stuff. This has vastly reduced the pain of a fresh OS installation, but the fact remains that there'll always be some persistent and annoying trouble.

The one I encountered this time around is really perplexing and I'm afraid of trying to figure out the root cause. For now, I'm happy with the workaround I ended up with.

I use redshift to set the color temperature of computer display. It is a simple temperature setting that doesn't depend on time or place, so my config is simple. However, it wasn't working when I moved from good old Ubuntu to Linux Mint (because Ubuntu is no longer user friendly). As per this discussion on Linux Mint forum, geoclue2 no longer works. That shouldn't matter for me since I don't need location services. Whatever, that same thread mentioned xsct as a simpler alternative that exactly fits my need (with a bonus of changing screen brightness!).

I installed it and a really really simple command from the terminal was all it needed to work. So, what was the annoying issue? I couldn't get it to autostart on login no matter what I did! At first, I thought perhaps I needed to use the full path of the command, but that obviously didn't solve my troubles. After fruitless search on the internet, I almost thought to ask a question on the forum. Before that though, I had the bright idea (really basic debugging rule) to first check if my autostart setup was working at all. I chose to autostart a terminal on login via the xfce4-terminal command and it did work!

So, why wasn't xsct working? No idea. But reading the man page of the terminal emulator showed that I can choose to execute a command with the -e option. So, that was my inelegant workaround! The desktop entry is shown below if you are curious. If you know why Exec=xsct 3500 0.9 doesn't work compared to Exec=xfce4-terminal -e 'xsct 3500 0.9', do let me know!

[Desktop Entry]
Encoding=UTF-8
Version=0.9.4
Type=Application
Name=displaytemperature
Comment=
Exec=xfce4-terminal -e 'xsct 3500 0.9'
OnlyShowIn=XFCE;
RunHook=0
StartupNotify=false
Terminal=false
Hidden=false

Update:

The above solution didn't work as I hoped. The screen setting would activate some of the time but most often it didn't. I resorted to creating a keyboard shortcut, which I'd use when the activation failed.

The terminal would always show up briefly on login though, which meant the issue wasn't due to the autostart failing altogether. One day I happened to notice that the screen setting would actually take effect before immediately relapsing to the normal temperature. And thus I arrived at the current solution that hasn't failed yet. I wrote a shell script that first used the sleep command to create a delay of one second before calling xsct. The autostart desktop entry now calls this script — again, using xfce4-terminal -e because for some reason calling the script directly still fails 🤷.

Understanding Python re(gex)? book announcement

2025-01-22T00:00:00+00:00

Hello!

I just published a new version of my Understanding Python re(gex)? ebook.

This book will help you learn Python Regular Expressions step-by-step from beginner to advanced levels with hundreds of examples and exercises. The standard library re as well as the third-party regex module are covered in this book.

Release offers🔗

To celebrate the new release, you can download the PDF/EPUB versions of Understanding Python re(gex)? for FREE till 31-Jan-2025.

Here are some more amazing offers:

All 13 books bundle is $18 (normal price $36) — Leanpub or Gumroad
100 Page Python Intro is FREE (normal price $10) — Leanpub or Gumroad

See also my blog post on how to customize pandoc for generating beautiful PDF/EPUB versions from GitHub style markdown.

What's new?🔗

Python version updated to 3.13
- deprecated features, SyntaxWarning, name change from re.error to re.PatternError
Corrected typos, updated descriptions, timing results and external links
Exercises are now numbered instead of using alphabets

Videos🔗

Check out my programming tips covering Python, command line tools and Vim:

re(gex)? playground🔗

To make it easier to experiment, I wrote on an interactive app. See PyRegexPlayground repo for installation instructions and usage guide. A sample screenshot is shown below:

re(gex)? exercises🔗

I wrote another TUI app to help you solve exercises from this book interactively. See PyRegexExercises repo for installation steps and app_guide.md for instructions on using this app. Here's a sample screenshot:

See my blog post Python regex cheatsheet for a quick reference.

Table of Contents🔗

Preface
Why is it needed?
re introduction
Anchors
Alternation and Grouping
Escaping metacharacters
Dot metacharacter and Quantifiers
Interlude: Tools for debugging and visualization
Working with matched portions
Character class
Groupings and backreferences
Interlude: Common tasks
Lookarounds
Flags
Unicode
regex module
Gotchas
Further Reading

Web version🔗

You can read the book online here: https://learnbyexample.github.io/py_regular_expressions/

GitHub repo🔗

Visit https://github.com/learnbyexample/py_regular_expressions for markdown source, example files, exercise solutions, sample chapters and other details related to the book.

Subscribe to learnbyexample weekly — free newsletter covering programming resources, updates on what I am creating, tools, free ebooks and more, delivered every Friday.

Feedback and Errata🔗

I would highly appreciate it if you'd let me know how you felt about this book. It could be anything from a simple thank you, rating/review, pointing out a typo, mistakes in code snippets, which aspects of the book worked for you (or didn't!) and so on. Reader feedback is essential and especially so for self-published authors.

You can reach me via:

Issue Manager: https://github.com/learnbyexample/py_regular_expressions/issues
E-mail: learn by [email protected] (without the spaces)
Twitter: https://twitter.com/learn_byexample

Happy learning :)

Coloring matched portions with GNU grep, sed and awk

2025-01-13T00:00:00+00:00

You might already know how to use the --color option to highlight matched portions with GNU grep. In this post, you'll see how to use ANSI escape sequences to format matched portions with GNU sed and GNU awk.

GNU grep🔗

Consider this sample input file:

$ cat fruits.txt
banana mango cherry pineapple
grape fig apple dragonfruit papaya
watermelon cashew tomato orange
almond lime grapefruit walnut

The output for grep --color -wE '[ago]\w+' fruits.txt is shown below:

See this section from my ebook on GNU grep for more details about this option. ripgrep has a more featured support for color formatting, see this section for an example.

Formatting with ANSI escape sequences🔗

Here are some examples to show how you can format text in the terminal using ANSI escape sequences:

Your choice of formatting goes between \033[ and m. You can use 01 for bold, 03 for italics and 04 for underline. 31 is for the color red, 32 for green and 34 for blue. Multiple formats can be specified by separating the parameters with a semicolon. Using 0 turns off the format (otherwise, it will persist in the current terminal session until turned off).

If you find a file that has accidentally saved such escape sequences, you can use cat -v to identify them.

$ echo 'one (two) three' | grep --color=always '(two)' | cat -v
one ^[[01;31m^[[K(two)^[[m^[[K three

GNU sed🔗

With GNU sed, you'll need to use \o to specify an octal escape sequence. Here's an example:

Here's an example for processing lines bounded by distinct markers:

$ cat blocks.txt
mango
icecream
--start 1--
dragon 1234
unicorn 6789
**end 1**
have a nice day
--start 2--
a b c
apple banana cherry
**end 2**
par,far,mar,tar

GNU awk🔗

With GNU awk, you can embed the ANSI escape sequences in a string similar to the printf example seen earlier.

Here's a field processing example:

$ cat marks.txt 
Dept    Name    Marks
ECE     Raj     53
ECE     Joel    62
EEE     Moi     68
CSE     Surya   81
EEE     Tia     59
ECE     Om      92
CSE     Amy     67

$ cat filter.txt 
ECE 70
EEE 65
CSE 80

Linux CLI ebooks🔗

Check out my ebooks if you are interested in learning more about Linux CLI basics, coreutils, text processing tools like GNU grep, GNU sed, GNU awk, perl and more! You can get them all as a single bundle via leanpub or gumroad.

100 Page Python Intro book announcement

2024-12-19T00:00:00+00:00

Hello!

I am pleased to announce a new version of my 100 Page Python Intro ebook. This book is a short, introductory guide for the Python programming language. This book is well suited:

As a reference material for Python beginner workshops
If you have prior experience with another programming language
If you want a complement resource after reading a Python basics book, watching a video course, etc

Release offers🔗

To celebrate the new release, you can download PDF/EPUB versions of the ebook for FREE till 02-Jan-2025. You can still pay if you wish ;)

Two of my bundles are on sale as well:

All 13 Books Bundle — $15 (normal price $32), learn Regular Expressions, Linux CLI tools, Python, Vim and more!
- Leanpub or Gumroad
Awesome Regex Bundle — $10 (normal price $20), Python, Ruby, JavaScript, BRE/ERE, PCRE and Vim regular expressions
- Leanpub or Gumroad

What's new?🔗

Python version updated to 3.13.0
Added more exercises and you can now practice some of them using this interactive TUI app
Descriptions and external links were updated/corrected
Updated Acknowledgements section
Code snippets related to info/warning sections will now appear as a single block
New cover image
Images centered for EPUB format

Videos🔗

Check out my programming tips covering Python, command line tools and Vim:

Testimonials🔗

It's very thorough, written with care, and presented in a way that makes sense. Even as an intermediate Python programmer, I found use in this book.

— feedback by Andrew Healey on an early draft of "100 Page Python Intro" mentioned in this Hacker News thread

Interactive TUI app🔗

I also wrote an interactive TUI app based on some of the exercises from the ebook. Reference solutions are also provided.

Table of Contents🔗

Preface
Introduction
Numeric data types
Strings and user input
Defining functions
Control structures
Importing and creating modules
Installing modules and Virtual environments
Exception handling
Debugging
Testing
Tuple and Sequence operations
List
Mutability
Dict
Set
Text processing
Comprehensions and Generator expressions
Dealing with files
Executing external commands
Command line arguments

Web version🔗

You can also read the book online here: https://learnbyexample.github.io/100_page_python_intro/introduction.html.

GitHub repo🔗

Visit https://github.com/learnbyexample/100_page_python_intro for programs, example files, markdown source and other details about the book.

See also my blog post on how to customize pandoc for generating beautiful PDF/EPUB versions from GitHub style markdown.

Feedback🔗

You can reach me via:

Issue Manager: https://github.com/learnbyexample/100_page_python_intro/issues
E-mail: echo 'bGVhcm5ieWV4YW1wbGUubmV0QGdtYWlsLmNvbQo=' | base64 --decode
Twitter: https://twitter.com/learn_byexample

Happy learning :)

Festive offers for books on Python, Linux, Regular Expressions, Vim and more!

2024-11-12T00:00:00+00:00

Hello!

Here are some awesome deals for programming books and courses during the 2024 festive season.

My ebooks🔗

Offers valid till 02-Dec-2024. You can get them on Leanpub:

All 13 Books bundle — $15 (normal price $32), learn Regular Expressions, Linux CLI tools, Python, Vim and more!
Linux CLI Text Processing bundle — $10 (normal price $20), grep, sed, awk, perl and ruby one-liners, GNU coreutils, CLI computing
Learn by example Python bundle — $8 (normal price $15), Python introduction, Regular Expressions and Projects
Understanding Python re(gex)? — FREE (normal price $10)

You can also avail these offers on Gumroad:

All 13 Books Bundle — $15 (normal price $32), learn Regular Expressions, Linux CLI tools, Python, Vim and more!
Linux CLI Text Processing bundle — $10 (normal price $20), grep, sed, awk, perl and ruby one-liners, GNU coreutils, CLI computing
Learn by example Python bundle — $8 (normal price $15), Python introduction, Regular Expressions and Projects
Understanding Python re(gex)? — FREE (normal price $10)

Indie creators🔗

7 Python books bundle — 50% off
Python books — 35% off with BF24 discount code
Ebooks on Django and Git — 50% off, plus purchasing power parity if applicable
- see also author's blog post for links to other Django-related deals
The Python Coding Place Membership — 40% off with black2024 discount code
Python Jumpstart — 50% launch discount
- see also author's blog post for links to other Python deals
Wizard Zines — 50% off with WIZARDPDF discount code
CSS Flex and Grid + Level up with Tailwind CSS — 60% off
Everyday Rails Testing with RSpec — 53% off

Other deals🔗

The Pragmatic Bookshelf — 40% off on all ebooks and audio books
Leanpub Black Friday Sale — offers for programming books, bundles and courses
Huge list of awesome deals — tools, productivity, books, courses, etc
InfoSec Hack Friday — InfoSec related software/tools
blackfridaydeals.dev — Hottest Black Friday Deals for Developers

Happy learning :)

Interactive Python Exercises and Quiz

2024-10-29T00:00:00+00:00

Having an interactive program that automatically loads questions and checks the solution is wonderful to have while learning a topic. This TUI app has beginner to intermediate level exercises and multiple-choice questions for Python learners.

Installation🔗

This app is available on PyPI as pythonexercises. Example installation instructions are shown below, adjust them based on your preferences and OS.

# virtual environment
$ python3 -m venv textual_apps
$ cd textual_apps
$ source bin/activate
$ pip install pythonexercises

# launch the app
$ pythonexercises

If you are on Windows, using the Windows Terminal is recommended. See this issue for Virtual Environment commands and other details.

To run the app without having to enter the virtual environment again, add this alias to .bashrc (or equivalent):

# you'll have to change the path
alias pythonexercises='/path/to/textual_apps/bin/pythonexercises'

As an alternative to manually managing such virtual environments, you can use https://github.com/pypa/pipx instead:

$ pipx install pythonexercises
$ pythonexercises

As yet another alternative, you can install textual (see Textual documentation for more details), clone this repository and run the python_exercises.py file. You'll need to install textual[syntax] to enable syntax highlighting (see documentation for more details).

Adjust the terminal dimensions for the widgets to appear properly, for example 84x25 (characters x lines). Here's another screenshot:

Guide🔗

See app_guide.md for instructions.

Ebook🔗

The exercise and quiz questions in this app have been adapted from my 100 Page Python Intro ebook.

Feedback🔗

I'd highly appreciate your feedback. Please file an issue if there are bugs, crashes, etc.

Hope you find this TUI app useful. Happy learning :)

Vim Reference Guide book announcement

2024-08-20T00:00:00+00:00

Hello!

I am pleased to announce a new version of my Vim Reference Guide ebook. This is intended as a concise learning resource for beginner to intermediate level Vim users. It has more in common with cheatsheets than a typical text book. Topics like Regular Expressions and Macros have more detailed explanations and examples due to their complexity. I hope this guide would make it much easier for you to discover Vim features and learning resources than my own blundering experience.

Release offers🔗

To celebrate the new release, you can download PDF/EPUB versions of the ebook for FREE till 31-Aug-2024. You can still pay if you wish ;)

Two of my bundles are on sale as well:

All Books Bundle is $15 (normal price $32) — all my 13 programming ebooks
Linux CLI Text Processing is 50% OFF — grep, sed, awk, perl and ruby one-liners, coreutils, cli computing

What's new?🔗

Updated ebook for Vim version 9.1
Corrected typos
Some of the examples, descriptions and external links were updated
New cover image

Videos🔗

Visit this playlist for video demos on most of the topics from the ebook.

Testimonials🔗

Got several suggestions and feedback when my submission about this book reached the front page of Hacker News.

Great job on this! — rendall

Hi, great work releasing this! Trying to explain vim concisely is always an interesting challenge and I had a great time reading your attempt in this book. I always find it really interesting on how people try to group certain vim functions in a way that makes sense to people that don't use vim. I think you cover that idea pretty well in your 'Vim philosophy and features' section whilst not making it overly abstract and keeping it relatable. — doix

Neat stuff! One piece of feedback is that I would include "+p and "+yy in the copy and paste section. — mrpotato

I learnt regular expression by reading your books, thank you for the great work. — LamJH

A comment from another Hacker News thread:

I stumbled upon your vi post a few days ago, really like the style. Keep it up!

Vim prank🔗

Did you know that Vim has an easy mode? It can be rather hard to use for those already familiar with Vim modes. I wrote a blog post about this mode, which was interesting enough to reach the front page of Hacker News!

Table of Contents🔗

Preface
Introduction
Insert mode
Normal mode
Command-line mode
Visual mode
Regular Expressions
Macro
Customizing Vim
CLI options

Web version🔗

You can also read the book online here: https://learnbyexample.github.io/vim_reference/.

GitHub repo🔗

Visit https://github.com/learnbyexample/vim_reference for markdown source and other details related to the book.

See also my blog post on how to customize pandoc for generating beautiful PDF/EPUB versions from GitHub style markdown.

Subscribe to learnbyexample weekly — free newsletter covering programming resources, updates on what I am creating, tools, free ebooks and more, delivered every Friday.

Feedback and Errata🔗

You can reach me via:

Issue Manager: https://github.com/learnbyexample/vim_reference/issues
E-mail: echo 'bGVhcm5ieWV4YW1wbGUubmV0QGdtYWlsLmNvbQo=' | base64 --decode
Twitter: https://twitter.com/learn_byexample

Happy learning :)

Basic examples for the Linux date command

2024-07-31T00:00:00+00:00

I rarely ever use the date command, but when I need it I almost always struggle to get the right incantation. So, I'm just going to record such examples in this blog post (and some good to know features).

There'll also be learning resources linked at the end of the post.

Really basic examples

The date command by itself shows the current time. But that's rarely what I need, since I could just use the calendar widget at the bottom of my desktop screen. Perhaps useful to copy the string format and modify system time with the -s option?

# use the -u option for UTC (Coordinated Universal Time)
$ date
Wednesday 31 July 2024 03:53:01 PM IST

Instead, I need particular parts in a particular format. For example, to represent the time component in a dynamically constructed filename as part of a shell script.

# same as: date +%F or date -I or date --iso-8601
$ date +%Y-%m-%d
2024-07-31

$ date +%Y/%m/%d
2024/07/31
$ date +%y-%m-%d
24-07-31

# use 'b' and 'B' for month names
$ date +%a
Wed
$ date +%A
Wednesday

You can use %x to get the locale representation:

$ date +%x
31/07/24

For hours, minutes and seconds:

# same as: date +%T
$ date +%H:%M:%S
16:00:32

# same as: date +%Y-%m-%dT%H:%M:%S%:z
$ date -Iseconds
2024-07-31T16:09:27+05:30

Displaying and converting epoch seconds

# total seconds since the epoch (1970-01-01 00:00:00 UTC)
$ date +%s
1722422393

$ date -d @1722422393 +'%F %T'
2024-07-31 16:09:53

You can also provide an input file for conversion using the -f option:

$ cat epochs.txt
@0000000000
@1234567890
@2222222222

# recall that the -u option gives you UTC
$ date -u -f epochs.txt +'%F %T'
1970-01-01 00:00:00
2009-02-13 23:31:30
2040-06-02 03:57:02

Date arithmetic

$ date -I
2024-08-02
$ date -d '+1 month 4 days'
Friday 06 September 2024 01:24:44 PM IST

# same as: date -d '-20 days' +%F
# you can also use '20 days ago'
$ date -I -d '-20 days'
2024-07-13

For my learnbyexample weekly newsletter, I use a script to generate a template issue. I use the arithmetic feature as shown below:

# prev_date variable gets the value from the previous newsletter issue
$ prev_date='2024-02-23'
$ date -d "$prev_date"' +7 days' +%F
2024-03-01

Resource links

Linux Command Line Computing book announcement

2024-05-29T00:00:00+00:00

Hello!

I am pleased to announce a new version of my Linux Command Line Computing ebook. This is the longest book I've published so far (204 pages) — it took me more than 7 months to complete the first version and another month for a minor revision.

This ebook aims to teach Linux command line tools and Shell Scripting for beginner to intermediate level users. The main focus is towards managing your files and performing text processing tasks. Plenty of examples are provided to make it easier to understand a particular tool and its various features. Exercises at the end of chapters will help you practice what you've learned and solutions are provided for reference. I hope this ebook would make it much easier for you to discover CLI tools, features and learning resources than my own blundering experience.

Release offers🔗

To celebrate the new release, you can download PDF/EPUB versions of the ebook for FREE till 9-June-2024. You can still pay if you wish ;)

Some of my bundles are on sale as well:

All books bundle is $12 (normal price $32) — all my 13 programming ebooks
Linux CLI Text Processing bundle is $7 (normal price $20) — Linux CLI tools, shell scripting, grep, sed, awk, perl and ruby one-liners

What's new?🔗

Some of the examples, exercises, descriptions and external links were updated/corrected
Book title changed to Linux Command Line Computing
New cover image

Videos🔗

Here's a short video about the Linux Command Line Computing ebook:

On this blog, I post tips covering Python, command line tools and Vim. Here are video demos for these tips:

Testimonials🔗

Ive only gotten through first pages but appears a good Unix/bash primer. I’ll probably recommend for new hires out of bootcamp because they’re usually weak here

— feedback on twitter

Nice book! I just started trying to get into linux today and you have some tips I haven’t found elsewhere and the text is an enjoyable read so far.

— feedback on reddit

Table of Contents🔗

Preface
Introduction and Setup
Command Line Overview
Managing Files and Directories
Shell Features
Viewing Part or Whole File Contents
Searching Files and Filenames
File Properties
Managing Processes
Multipurpose Text Processing Tools
Sorting Stuff
Comparing Files
Assorted Text Processing Tools
Shell Scripting
Shell Customization

Web version🔗

You can also read the book online here: https://learnbyexample.github.io/cli-computing/.

GitHub repo🔗

Visit https://github.com/learnbyexample/cli-computing for markdown source, example files, exercise solutions, sample chapters and other details related to the book.

See also my blog post on how to customize pandoc for generating beautiful PDF/EPUB versions from GitHub style markdown.

Subscribe to learnbyexample weekly — free newsletter covering programming resources, updates on what I am creating, tools, free ebooks and more, delivered every Friday.

Feedback and Errata🔗

You can reach me via:

Issue Manager: https://github.com/learnbyexample/cli-computing/issues
E-mail: echo 'bGVhcm5ieWV4YW1wbGUubmV0QGdtYWlsLmNvbQo=' | base64 --decode
Twitter: https://twitter.com/learn_byexample

Happy learning :)

Interactive GNU awk tutorial

2024-04-30T00:00:00+00:00

Know command line basics and want to learn the GNU awk command? Check out my interactive TUI app that gives a brief tour of this popular text processing command.

Installation🔗

This app is available on PyPI as awktutorial. Example installation instructions are shown below, adjust them based on your preferences and OS.

# virtual environment
$ python3 -m venv textual_apps
$ cd textual_apps
$ source bin/activate
$ pip install awktutorial

# launch the app
$ awktutorial

To run the app without having to enter the virtual environment again, add this alias to .bashrc (or equivalent):

# you'll have to change the path
alias awktutorial='/path/to/textual_apps/bin/awktutorial'

As an alternative to manually managing such virtual environments, you can use https://github.com/pypa/pipx instead:

$ pipx install awktutorial
$ awktutorial

As yet another alternative, you can install textual (see Textual documentation for more details), clone my TUI-apps repository repository and run the awk_tutorial.py file.

Adjust the terminal dimensions for the widgets to appear properly, for example 84x25 (characters x lines).

Ebook🔗

See my CLI text processing with GNU awk ebook to learn GNU awk with hundreds of examples and exercises.

Feedback🔗

I'd highly appreciate your feedback. Please file an issue if there are bugs, crashes, etc.

Hope you'll find this TUI app useful. Happy learning :)

CLI computation with GNU datamash

2024-04-09T00:00:00+00:00

I'm hoping this post will serve as a quick reference for some of the use cases and tickle your curiosity if you haven't come across this nifty CLI text processing tool yet. There are also links for further reading at the end.

Installation and Documentation🔗

See download page for source code and instructions to install the software on various platforms. This blog post is based on the 1.8 version.

See datamash manual for links to documentation in HTML, plain text, PDF, etc.

Sum🔗

# file with a single number per line
$ cat nums.txt
42
-2
10101
-3.14
-75
$ datamash sum 1 <nums.txt
10062.86

$ echo '3.14 42 1000 -51' | tr ' ' '\n' | datamash sum 1
994.14

# summing a particular column
# tab is the default field separator
$ cat table.txt
brown bread mat hair 42
blue cake mug shirt -7
yellow banana window shoes 3.14
$ datamash -t' ' sum 5 <table.txt
38.14

Other such operations include count, min, max, mean, median, sstdev (standard deviation), etc.

Transpose and Reverse🔗

$ cat scores.csv
Name,Maths,Physics,Chemistry
Ith,100,100,100
Cy,97,98,95
Lin,78,83,80
Er,60,70,90

# interchange rows and columns
$ datamash -t, transpose <scores.csv
Name,Ith,Cy,Lin,Er
Maths,100,97,78,60
Physics,100,98,83,70
Chemistry,100,95,80,90

# reverse columns
$ datamash -t, reverse <scores.csv
Chemistry,Physics,Maths,Name
100,100,100,Ith
95,98,97,Cy
80,83,78,Lin
90,70,60,Er

Group by🔗

You can use the -g option to group items based on one or more columns. You can specify an operation such as collapse, sum, mean, count and so on. See Grouping rows by categories avoiding repetition for an example with unique.

# here, the first column items are already next to each other
# so, sorting is not needed
$ cat toys.txt
car blue
car red
car yellow
truck brown
bus green
bus maroon
rocket white

# by default a comma is used as the separator between collapsed items
# use 'unique' instead of 'collapse' to avoid duplicates
$ datamash -t' ' -g1 collapse 2 <toys.txt
car blue,red,yellow
truck brown
bus green,maroon
rocket white

# 'count' gives the number of items for the collapsed row
# 'rand' selects a random item for such collapsed rows
# 'first' and 'last' are other choices available
$ datamash -t' ' -g1 count 2 rand 2 <toys.txt
car 3 red
truck 1 brown
bus 2 green
rocket 1 white

Here's an example with header lines as well as having to sort the input (-s). The -c option helps to customize the separator for the grouped items. The -H option is equivalent to using both --header-in and --header-out.

$ cat books.txt
Author,Title
Will Wight,Cradle
John Bierce,Mage Errant
Brandon Sanderson,Mistborn
Domagoj Kurmaic,Mother of Learning
Brandon Sanderson,The Stormlight Archive
Will Wight,The Last Horizon
Brandon Sanderson,Warbreaker

# not sure if there's an option to retain the original header line as is
# you can instead use: (sed -u 1q; datamash -st, -c: -g1 collapse 2) <books.txt
# use --header-in if you don't want the header line in the output
$ datamash -H -st, -c: -g1 collapse 2 <books.txt
GroupBy(Author),collapse(Title)
Brandon Sanderson,Mistborn:The Stormlight Archive:Warbreaker
Domagoj Kurmaic,Mother of Learning
John Bierce,Mage Errant
Will Wight,Cradle:The Last Horizon

Here's an example of summing values based on column 3 items:

$ cat duplicates.csv
brown,toy,bread,42
dark red,ruby,rose,111
blue,ruby,water,333
dark red,sky,rose,555
yellow,toy,flower,333
white,sky,bread,111
light red,purse,rose,333

$ datamash -st, -g3 sum 4 <duplicates.csv
bread,153
flower,333
rose,999
water,333

Average marks:

$ cat result.csv
Amy,maths,90
Amy,physics,75
Joe,maths,79
John,chemistry,77
John,physics,91
Moe,maths,81
Ravi,physics,84
Ravi,chemistry,70
Yui,maths,92

$ datamash -t, -g1 mean 3 <result.csv
Amy,82.5
Joe,79
John,84
Moe,81
Ravi,77
Yui,92

CLI text processing with GNU Coreutils book announcement

2024-04-03T00:00:00+00:00

Hello!

I am pleased to announce a new version of my CLI text processing with GNU Coreutils ebook. Examples, descriptions and external links were updated/corrected and 100+ exercises were added.

You might be already aware of popular coreutils commands like head, tail, tr, sort and so on. This book will teach you more than twenty of such specialized text processing tools provided by the GNU coreutils package.

Release offers🔗

To celebrate the new release, you can download PDF/EPUB versions of the ebook for FREE till 10-April-2024. You can still pay if you wish ;)

The following bundles are heavily discounted:

All books bundle is $12 (normal price $32)
Linux CLI Text Processing bundle is $6 (normal price $20)

What's new?🔗

GNU coreutils package version updated to 9.1
Added 100+ exercises
In general, many of the examples, descriptions and external links were updated/corrected
Updated Acknowledgements section
Code snippets related to info/warning sections will now appear as a single block
Book title changed to CLI text processing with GNU Coreutils
New cover image

Videos🔗

On this blog, I post tips covering Python, command line tools and Vim. Here are video demos for these tips:

Testimonials🔗

In my opinion the book does a great job of quickly presenting examples of how commands can be used and then paired up to achieve new or interesting ways of manipulating data. Throughout the text there are little highlights offering tips on extra functionality or limitations of certain commands. For instance, when discussing the shuf command we're warned that shuf will not work with multiple files. However, we can merge multiple files together (using the cat command) and then pass them to shuf. These little gems of wisdom add a dimension to the book and will likely save the reader some time wondering why their scripts are not working as expected.

— book review by Jesse Smith on distrowatch.com

I discovered your books recently and they’re awesome, thank you! As a 20 year *nix they made me realize how much more there are to these rock solid and ancient tools, once you spend the time to actually learn the intricacies of them.

— feedback on reddit

Table of Contents🔗

Preface
Introduction
cat and tac
head and tail
tr
cut
seq
shuf
paste
pr
fold and fmt
sort
uniq
comm
join
nl
wc
split
csplit
expand and unexpand
basename and dirname
What next?

Web version🔗

You can also read the book online here: https://learnbyexample.github.io/cli_text_processing_coreutils/introduction.html.

GitHub repo🔗

Visit https://github.com/learnbyexample/cli_text_processing_coreutils for markdown source, example files, exercise solutions, sample chapters and other details related to the book.

See also my blog post on how to customize pandoc for generating beautiful PDF/EPUB versions from GitHub style markdown.

Subscribe to learnbyexample weekly — free newsletter covering programming resources, updates on what I am creating, tools, free ebooks and more, delivered every Friday.

Feedback and Errata🔗

You can reach me via:

Issue Manager: https://github.com/learnbyexample/cli_text_processing_coreutils/issues
E-mail: echo 'bGVhcm5ieWV4YW1wbGUubmV0QGdtYWlsLmNvbQo=' | base64 --decode
Twitter: https://twitter.com/learn_byexample

Happy learning :)

Ruby One-Liners Guide book announcement

2024-02-20T00:00:00+00:00

Hello!

I am pleased to announce a new version of my Ruby One-Liners Guide ebook. Examples, exercises, solutions, descriptions and external links were added/updated/corrected.

When it comes to command line text processing, there are several well known tools like grep for filtering, sed for substitution and awk for field processing. Compared to such tools, Ruby has a feature rich regular expression engine, plenty of builtin modules and a thriving ecosystem. Another advantage is that Ruby is more portable.

This ebook will show examples for filtering and substitution features, field processing, using standard and third-party modules, multiple file processing, how to construct solutions that depend on multiple records, how to compare records and fields between two or more files, how to identify duplicates while maintaining input order and so on.

Release offers🔗

To celebrate the new release, you can download PDF/EPUB versions of Ruby One-Liners Guide for FREE till 29-February-2024. You can still pay if you wish ;)

Ruby Text Processing bundle is free as well:

So is the Magical one-liners bundle:

What's new?🔗

Command version updated to Ruby 3.3.0
Added more exercises
Long sections split into smaller ones
In general, many of the examples, exercises, solutions, descriptions and external links were updated/corrected
Updated Acknowledgements section
Code snippets related to info/warning sections will now appear as a single block
Book title changed to Ruby One-Liners Guide
New cover image

Videos🔗

On this blog, I post tips covering Python, command line tools and Vim. Here are video demos for these tips:

Testimonials🔗

This Ruby one-liners cookbook is incredible. Pretty mind boggling all the stuff you can do.

— feedback on twitter

Table of Contents🔗

Preface
One-liner introduction
Line processing
Field separators
Record separators
Multiple file input
Processing multiple records
Two file processing
Dealing with duplicates
Processing structured data

Web version🔗

You can also read the book online here: https://learnbyexample.github.io/learn_ruby_oneliners/.

GitHub repo🔗

Visit https://github.com/learnbyexample/learn_ruby_oneliners for markdown source, example files, exercise solutions, sample chapters and other details related to the book.

See also my blog post on how to customize pandoc for generating beautiful PDF/EPUB versions from GitHub style markdown.

Subscribe to learnbyexample weekly — free newsletter covering programming resources, updates on what I am creating, tools, ebooks and more, delivered every Friday.

Feedback and Errata🔗

You can reach me via:

Issue Manager: https://github.com/learnbyexample/learn_ruby_oneliners/issues
E-mail: echo 'bGVhcm5ieWV4YW1wbGUubmV0QGdtYWlsLmNvbQo=' | base64 --decode
Twitter: https://twitter.com/learn_byexample

Happy learning :)

Understanding Ruby Regexp book announcement

2024-02-02T00:00:00+00:00

Hello!

I just published a new version of the "Understanding Ruby Regexp" ebook. Corrected examples and descriptions for Atomic grouping, \G and \K features, improved examples, exercises and so on.

This book will help you learn Ruby Regular Expressions step-by-step from beginner to advanced levels with hundreds of examples and exercises.

Ebook links🔗

You can download the PDF/EPUB versions of the book for free using the below links (you can also pay if you wish):

You can also read the book online here: https://learnbyexample.github.io/Ruby_Regexp/.

What's new?🔗

Ruby version updated to 3.3.0
Corrected examples and descriptions for Atomic grouping, \G and \K features
In general, many of the examples, exercises, solutions, descriptions and external links were updated/corrected
Updated Acknowledgements section
Code snippets related to info/warning sections will now appear as a single block
Book title changed to Understanding Ruby Regexp
New cover image
Images centered for EPUB format

Videos🔗

On this blog, I post tips covering Python, command line tools and Vim. Here are video demos for these tips:

Table of Contents🔗

Preface
Why is it needed?
Regexp introduction
Anchors
Alternation and Grouping
Escaping metacharacters
Dot metacharacter and Quantifiers
Interlude: Tools for debugging and visualization
Working with matched portions
Character class
Groupings and backreferences
Interlude: Common tasks
Lookarounds
Modifiers
Unicode
Further Reading

GitHub repo🔗

Visit https://github.com/learnbyexample/Ruby_Regexp for markdown source, exercise solutions, sample chapters and other details related to the book.

See my blog post on how to customize pandoc for generating beautiful PDF/EPUB versions from GitHub style markdown.

Subscribe to learnbyexample weekly — free newsletter covering programming resources, updates on what I am creating, tools, ebooks and more, delivered every Friday.

Feedback and Errata🔗

You can reach me via:

Issue Manager: https://github.com/learnbyexample/Ruby_Regexp/issues
E-mail: learn by [email protected] (without the spaces)
Twitter: https://twitter.com/learn_byexample

Happy learning :)

2023: year in perspective

2023-12-29T00:00:00+00:00

TL;DR: Updated six programming ebooks, created four interactive TUI apps for exercises, wrote blog posts, recorded YouTube videos, newsletter prospered, read 100+ novels, and so on. Had a great year in terms of ebook sales despite worries over AI tools 😇

Books updated🔗

This year I focused on updating my existing ebooks instead of working on a new one. I managed to revise 6 out of my 13 published works so far. Examples and exercises were added and improved. Typos were corrected, sections added for new features (if any), new book covers, promo videos and so on.

Understanding Python re(gex)? — Learn Python Regular Expressions step-by-step from beginner to advanced levels with 300+ examples
CLI text processing with GNU grep and ripgrep — Example based guide to mastering GNU grep and ripgrep
CLI text processing with GNU sed — Example based guide to mastering GNU sed
CLI text processing with GNU awk — Example based guide to mastering GNU awk one-liners
Perl One-Liners Guide — Example based guide for text processing with Perl from the command line
Understanding JavaScript RegExp — Learn JavaScript Regular Expressions step-by-step from beginner to advanced levels with hundreds of examples and exercises

TUI apps🔗

Last year, I had learned a bit of Textual. My aim was to create interactive apps for practicing exercises from my ebooks. I wrote the following apps:

Python re(gex)? exercises — 100+ exercises for Python Regular Expressions
- Python re(gex)? playground — interactive playground, also includes a cheatsheet
Grep Exercises — 50+ exercises for GNU grep (or alternate implementations like ripgrep)
Sed Exercises — 50+ exercises for GNU sed
Awk Exercises — 80+ exercises for GNU awk

And I also added more exercises for the Linux CLI Text Processing Exercises app.

Blog posts🔗

Most of my blog posts this year were related to book and interactive app announcements. So, not really a choice to pick favorites from:

I also posted some weekly programming tips (Python, Linux, Vim).

Book sales🔗

Revenue from ebook sales were about 10% lower than last year. At the start of the year, I'd have been satisfied even if it had been 50% lower. I wasn't writing new ebooks and AI tools were all the rage on social media. Somehow, I got lucky with self-promotion posts for my GNU awk ebook and the rest of the months weren't too shabby. Here's my Gumroad revenue chart for 2023:

You can clearly see when the GNU awk ebook was updated. Sales on Gumroad was actually just a bit higher than last year. It was on Leanpub that sales were much lower, almost half compared to last year. Profits reduced more than 10% since Gumroad increased their fees. Overall, I'm still earning more than I need and I'm hoping that next year wouldn't see too much drop in sales.

I started a newsletter, learnbyexample weekly, two years back. I've managed to send an email every Friday without fail so far and I'm proud of that. Sometimes I had to schedule issues weeks ahead. Total subscriber count crossed 1000 earlier this month and some readers are even paying me monthly despite this being a free newsletter.

Fictional reading🔗

I enjoy reading fantasy and science-fiction novels. I read 100+ SFF books this year despite aiming for less than 100! Anyway, I wrote a post listing my favorites here.

I even participated in NaNoWriMo. I only wrote 20K words, but I did have some fun. The novel went nowhere though and it is languishing now. Not sure if I'd get back to it someday.

Goals for 2024🔗

There are seven more books I need to update. Hopefully I get them done in a year, though I won't be pushing hard. If I crave to write some new books instead, I'd switch over to them. Or even do something else entirely. After more than six years writing tutorials and books, I sure can do with a break.

Here's wishing you a very happy, healthy and prosperous 2024 👍 😇

Festive offers for books on Python, Linux, Regular Expressions and more

2023-11-18T00:00:00+00:00

Hello!

Here are some exciting deals for my programming ebooks as well as from other creators.

My ebooks🔗

Offers valid till 30-Nov-2023:

All 13 Books Bundle — $10 (normal price $32)
Learn by example Python bundle — $4 (normal price $15)
Understanding Python re(gex)? — FREE (normal price $10)

Indie creators🔗

Python Problem-Solving Bootcamp — 40% off or purchasing power parity discount whichever is greater
Python books by Michael Driscoll and Teach Me Python Membership — 33% off with black23 discount code
Ebooks on Django and Git — 50% off, plus purchasing power parity if applicable
- see also author's blog post for links to other Django-related deals
The Python Coding Place Membership — 70% off
Python Morsels Membership — lifetime access for the price of 2 years
- see also author's blog post for links to other Python deals
Python To Projects - 5 Week Online Course — 25% off
Python books by Reuven Lerner — 40% off
wizard zines — 50% off on PDFs, 30% off on print versions
Level up with Tailwind CSS and Complete Guide to CSS Flex and Grid — 50% off

Miscellaneous🔗

NoStarch Press — 35% off with DEALS4DAYS code
The Pragmatic Bookshelf — 40% off on all ebooks and audio books
Manning Publications — save 50% when you buy 2 or more MEAPs, eBooks, pBooks, liveProjects, or liveVideos
InfoSec Hack Friday — InfoSec related software/tools
The Cyber Plumber's Lab Guide and Interactive Access — 33% OFF
Leanpub Monthly Sale and Leanpub Weekly Sale — offers for programming books, bundles and courses
Huge list of awesome deals — tools, productivity, books, courses, etc
blackfridaydeals.dev — Hottest Black Friday Deals for Developers
Black Friday Deals — Savings on Tech Books and Courses

Happy learning :)

Understanding JavaScript RegExp book announcement

2023-10-26T00:00:00+00:00

Hello!

I just published a new version of "Understanding JavaScript RegExp" ebook. Added examples for d and v flags, corrected many mistakes, improved examples, exercises and so on.

This book will help you learn JavaScript Regular Expressions step-by-step from beginner to advanced levels with hundreds of examples and exercises.

Release offers🔗

To celebrate the new release, you can download PDF/EPUB versions of Understanding JavaScript RegExp for FREE till 05-Nov-2023. You can still pay if you wish ;)

All Books Bundle is just $12 (normal price $32) — includes all my 13 programming ebooks.

What's new?🔗

Examples and exercises added for d and v flags
Strings in code snippets changed to be uniformly represented in single quotes
In general, many of the examples, exercises, solutions, descriptions and external links were updated/corrected
Updated Acknowledgements section
Code snippets related to info/warning sections will now appear as a single block
Book title changed to Understanding JavaScript RegExp
New cover image
Images centered for EPUB format

Videos🔗

On this blog, I post tips covering Python, command line tools and Vim. Here are video demos for these tips:

Testimonials🔗

Literally was having a mini-breakdown about not understanding Regex in algorithm solutions the other day and now I'm feeling so much better, so thank YOU! I genuinely feel like I'm developing the skill for spotting when and where to use them after so much practice!

— feedback on twitter

Table of Contents🔗

Preface
Why is it needed?
RegExp introduction
Anchors
Alternation and Grouping
Escaping metacharacters
Dot metacharacter and Quantifiers
Interlude: Tools for debugging and visualization
Working with matched portions
Character class
Groupings and backreferences
Interlude: Common tasks
Lookarounds
Unicode
Further Reading

Web version🔗

You can also read the book online here: https://learnbyexample.github.io/learn_js_regexp/.

GitHub repo🔗

Visit https://github.com/learnbyexample/learn_js_regexp for markdown source, exercise solutions, sample chapters and other details related to the book.

See my blog post on how to customize pandoc for generating beautiful PDF/EPUB versions from GitHub style markdown.

Subscribe to learnbyexample weekly — free newsletter covering programming resources, updates on what I am creating, tools, free ebooks and more, delivered every Friday.

Feedback and Errata🔗

I would highly appreciate if you'd let me know how you felt about this book. It could be anything from a simple thank you, Gumroad rating, pointing out a typo, mistakes in code snippets, which aspects of the book worked for you (or didn't!) and so on. Reader feedback is essential and especially so for self-published authors.

You can reach me via:

Issue Manager: https://github.com/learnbyexample/learn_js_regexp/issues
E-mail: learn by [email protected] (without the spaces)
Twitter: https://twitter.com/learn_byexample

Happy learning :)

CLI text editing with ed

2023-10-17T00:00:00+00:00

I'm finally writing a post on the ed command. And I'm keeping it short so that I'll actually publish the post. The examples presented below will be easier to understand for those already familiar with Vim and sed. See the links at the end for learning resources.

Although I'm interested in getting to know ed better, I don't really find myself in situations where it'd help me. But, I have used it a few times to answer questions on stackoverflow.

Moving lines

Consider this sample input file:

$ cat ip.txt
apple
banana
cherry
fig
mango
pineapple

Suppose, you want to move the third line to the top. If you are using Vim, you can execute :3m0 where 3 is the input address, m is the move command and 0 is the target address. To do the same with ed:

$ printf '3m0\nwq\n' | ed -s ip.txt -

$ cat ip.txt
cherry
apple
banana
fig
mango
pineapple

The 3m0 part in the above ed command is identical to the Vim solution. After that, another command wq (write and quit) is issued to save the changes (again, Vim users would be familiar with this combination). The -s option suppresses diagnostics and other details. - is used to indicate that the ed script is passed via stdin.

You can also move lines based on a regexp match. Here's an example:

# move the first matching line containing 'an' to the top of the file
$ printf '/an/m0\nwq\n' | ed -s ip.txt -

$ cat ip.txt
banana
cherry
apple
fig
mango
pineapple

If you want to move all the matching lines, you can use the g command (same as Vim). Note that the first matching line will be moved first, then the next matching line and so on. So the order will be reversed after the move.

$ printf 'g/app/m0\nwq\n' | ed -s ip.txt -

$ cat ip.txt
pineapple
apple
banana
cherry
fig
mango

Here's the stackoverflow link that inspired the above examples. See this stackoverflow answer for more examples of moving lines. See this one to learn how to copy a particular line to the end of the file. See this unix.stackexchange answer for an example of moving a range of lines, where the same regex matches both the starting and ending lines.

Negative addressing

There are plenty of addressing features provided by the GNU sed command, but negative addressing isn't one. Here's an example of deleting the last but second line using ed:

$ cat colors.txt
red
green
blue
yellow
black

$ printf '$-2d\nwq\n' | ed -s colors.txt -
$ cat colors.txt
red
green
yellow
black

Resource links

Perl One-Liners Guide book announcement

2023-09-28T00:00:00+00:00

Hello!

I am pleased to announce a new version of my Perl One-Liners Guide ebook. Examples, exercises, solutions, descriptions and external links were added/updated/corrected.

When it comes to command line text processing, there are several well known tools like grep for filtering, sed for substitution and awk for field processing. Compared to such tools, Perl has a feature rich regular expression engine, plenty of builtin modules and a thriving ecosystem. Another advantage is that Perl is more portable.

Release offers🔗

To celebrate the new release, you can download PDF/EPUB versions of Perl One-Liners Guide for FREE till 07-October-2023. You can still pay if you wish ;)

All Books Bundle is just $12 (normal price $32), includes all my 13 programming ebooks.

What's new?🔗

Command version updated to Perl 5.38.0
- option -g slurps entire file contents
Many more exercises added
Long sections split into smaller ones
In general, many of the examples, exercises, solutions, descriptions and external links were updated/corrected
Updated Acknowledgements section
Code snippets related to info/warning sections will now appear as a single block
Book title changed to Perl One-Liners Guide
New cover image

Videos🔗

On this blog, I post tips covering Python, command line tools and Vim. Here are video demos for these tips:

Testimonials🔗

This is fantastic! 👏 I use Perl one-liners for record and text processing a lot and this will be definitely something I will keep coming back to - I’ve already learned a trick from “Context Matching” (9) 🙂

— feedback on [email protected]

Table of Contents🔗

Preface
One-liner introduction
Line processing
In-place file editing
Field separators
Record separators
Using modules
Multiple file input
Processing multiple records
Two file processing
Dealing with duplicates
Perl rename command

Web version🔗

You can also read the book online here: https://learnbyexample.github.io/learn_perl_oneliners/.

GitHub repo🔗

Visit https://github.com/learnbyexample/learn_perl_oneliners for markdown source, example files, exercise solutions, sample chapters and other details related to the book.

See also my blog post on how to customize pandoc for generating beautiful PDF/EPUB versions from GitHub style markdown.

Subscribe to learnbyexample weekly — free newsletter covering programming resources, updates on what I am creating, tips, tools, free ebooks and more, delivered every Friday.

Feedback and Errata🔗

You can reach me via:

Issue Manager: https://github.com/learnbyexample/learn_perl_oneliners/issues
E-mail: echo 'bGVhcm5ieWV4YW1wbGUubmV0QGdtYWlsLmNvbQo=' | base64 --decode
Twitter: https://twitter.com/learn_byexample

Happy learning :)

Vim tip 33: editing with text objects

2023-09-25T00:00:00+00:00

Combining motions such as w, % and f with editing commands like d, c and y require precise positioning to be effective.

Vim also provides a list of handy context based options to make certain editing use cases easier using the i and a text object selections. You can easily remember the difference between these two options by thinking i as inner and a as around.

diw delete a word regardless of where the cursor is on that word
- equivalent to using de when the cursor is on the first character of the word
diW delete a WORD regardless of where the cursor is on that WORD
daw delete a word regardless of where the cursor is on that word as well as a space character to the left/right of the word depending on its position in the current sentence
dis delete a sentence regardless of where the cursor is on that sentence
yas copy a sentence regardless of where the cursor is on that sentence as well as a space character to the left/right
cip delete a paragraph regardless of where the cursor is on that paragraph and change to Insert mode
dit delete all characters within HTML/XML tags, nesting is taken care as well
- see :h tag-blocks for details about corner cases
di" delete all characters within a pair of double quotes, regardless of where the cursor is within the quotes
da' delete all characters within a pair of single quotes along with the quote characters
ci( delete all characters within () and change to Insert mode
- works even if the parenthesis are spread over multiple lines, nesting is taken care as well
ya} copy all characters within {} including the {} characters
- works even if the braces are spread over multiple lines, nesting is taken care as well

You can use a count prefix for nested cases. For example, c2i{ will clear the inner braces (including the braces, and this could be nested too) and then only the text between braces for the next level.

See :h text-objects for more details.

Video demo:

Vim tip 32: text and indent settings

2023-09-19T00:00:00+00:00

Here are some text and indent Vim settings that you can put in the vimrc file to customize your editor. See :h options.txt for complete reference.

filetype plugin indent on enables loading of plugin and indent files
- these files become active based on the type of the file to influence syntax highlighting, indentation, etc
- :echo $VIMRUNTIME gives your installation directory (indent and plugin directories would be present in this path)
- see :h vimrc-filetype, :h :filetype-overview and :h filetype.txt for more details
set autoindent copy indent from the current line when starting a new line
- useful for files not affected by indent setting
- see also :h smartindent
set textwidth=80 guideline for Vim to automatically move to a new line with 80 characters as the limit
- white space is used to break lines, so a line can still be greater than the limit if there's no white space
- default is 0 which disables this setting
set colorcolumn=80 create a highlighted vertical bar at column number 80
- use highlight ColorColumn setting to customize the color for this vertical bar
- see vi.stackexchange: Keeping lines to less than 80 characters for more details
set shiftwidth=4 number of spaces to use for indentation (default is 8)
set tabstop=4 width for the tab character (default is 8)
set expandtab use spaces for tab expansion
set cursorline highlight the line containing the cursor

Video demo:

CLI tip 33: manipulating string case with GNU sed

2023-09-11T00:00:00+00:00

sed provides escape sequences to change the case of replacement strings, which might include backreferences, shell variables, etc.

Sequence	Description
`\E`	indicates the end of case conversion
`\l`	convert the next character to lowercase
`\u`	convert the next character to uppercase
`\L`	convert the following characters to lowercase (overridden by `\U` or `\E`)
`\U`	convert the following characters to uppercase (overridden by `\L` or `\E`)

First up, changing case of only the immediate next character after the escape sequence.

# match only the first character of a word
# use & to backreference the matched character
# \u would then change it to uppercase
$ echo 'hello there. how are you?' | sed 's/\b\w/\u&/g'
Hello There. How Are You?

# change the first character of a word to lowercase
$ echo 'HELLO THERE. HOW ARE YOU?' | sed 's/\b\w/\l&/g'
hELLO tHERE. hOW aRE yOU?

# match lowercase followed by underscore followed by lowercase
# delete the underscore and convert the 2nd lowercase to uppercase
$ echo '_fig aug_price next_line' | sed -E 's/([a-z])_([a-z])/\1\u\2/g'
_fig augPrice nextLine

Next, changing case of multiple characters at a time.

# change all alphabets to lowercase
$ echo 'HaVE a nICe dAy' | sed 's/.*/\L&/'
have a nice day
# change all alphabets to uppercase
$ echo 'HaVE a nICe dAy' | sed 's/.*/\U&/'
HAVE A NICE DAY

# \E will stop further conversion
$ echo 'fig_ aug_price next_line' | sed -E 's/([a-z]+)(_[a-z]+)/\U\1\E\2/g'
fig_ AUG_price NEXT_line
# \L or \U will override any existing conversion
$ echo 'HeLLo:bYe gOoD:beTTEr' | sed -E 's/([a-z]+)(:[a-z]+)/\L\1\U\2/Ig'
hello:BYE good:BETTER

Finally, examples where escapes are used next to each other.

# uppercase first character of a word
# and lowercase rest of the word characters
# note the order of escapes used, \u\L won't work
$ echo 'HeLLo:bYe gOoD:beTTEr' | sed -E 's/[a-z]+/\L\u&/Ig'
Hello:Bye Good:Better

# lowercase first character of a word
# and uppercase rest of the word characters
$ echo 'HeLLo:bYe gOoD:beTTEr' | sed -E 's/[a-z]+/\U\l&/Ig'
hELLO:bYE gOOD:bETTER

Video demo:

See also my CLI text processing with GNU sed ebook.

Python tip 33: sorting iterables based on multiple conditions

2023-09-05T00:00:00+00:00

In an earlier tip, you learned how to sort iterables based on a key. You can use a sequence like list or tuple to specify a tie-breaker condition when two or more items are deemed equal under the primary sorting rule.

>>> books = ('Mage Errant', 'Piranesi', 'Cradle', 'The Weirkey Chronicles', 'Mistborn')

# sorts based on the number of words
# retains original order for items with the same number of words
>>> sorted(books, key=lambda b: b.count(' '))
['Piranesi', 'Cradle', 'Mistborn', 'Mage Errant', 'The Weirkey Chronicles']

# items with the same number of words are further sorted in alphabetic order
>>> sorted(books, key=lambda b: (b.count(' '), b))
['Cradle', 'Mistborn', 'Piranesi', 'Mage Errant', 'The Weirkey Chronicles']

To sort in descending order, usually the reverse=True keyword argument is used. But what if the primary and secondary rules are opposites? If one of the rule is numerical in nature, you can simply negate the number to reverse the order.

# descending order based on the number of words
# ascending alphabetic order for items with the same number of words
>>> sorted(books, key=lambda b: (-b.count(' '), b))
['The Weirkey Chronicles', 'Mage Errant', 'Cradle', 'Mistborn', 'Piranesi']

# reverse the above result
>>> sorted(books, key=lambda b: (-b.count(' '), b), reverse=True)
['Piranesi', 'Mistborn', 'Cradle', 'Mage Errant', 'The Weirkey Chronicles']

Vim tip 31: mark frequently used locations

2023-08-29T00:00:00+00:00

You can save frequently visited locations using marks for quicker navigation to those positions in the file. You can also pair marks with motion commands for tasks like copying, deleting, etc.

ma mark location in the file using the alphabet a
- you can use any of the 26 alphabets
- use lowercase alphabets to work within the current file
- use uppercase alphabets to work from any file
- :marks will show a list of the existing marks
`a move to the exact location marked by a
'a move to the first non-blank character of the line marked by a
'A move to the first non-blank character of the line marked by A (this will work for any file where the mark was set)
d`a delete from the current character to the character marked by a
- marks can be paired with any command that accept motions like d, y, >, etc

Motion commands that take you across lines (for example, 10G) will automatically save the location you jumped from in the default ` mark. You can move back to that exact location using `` or the first non-blank character using '`. Note that the arrow and word motions aren't considered for the default mark even if they move across lines.

See :h mark-motions for more ways to use marks.

Video demo:

CLI tip 32: text processing between two files with GNU awk

2023-08-21T00:00:00+00:00

awk is handy to compare records and fields between two or more files. The key features used in the solution below:

For two files as input, NR==FNR will be true only when the first file is being processed
next will skip rest of the script and fetch the next record
a[$0] by itself is a valid statement. It will create an uninitialized element in array a with $0 as the key (assuming the key doesn't exist yet)
$0 in a checks if the given string ($0 here) exists as a key in the array a

$ cat colors_1.txt
teal
light blue
green
yellow
$ cat colors_2.txt
light blue
black
dark green
yellow

# common lines
$ awk 'NR==FNR{a[$0]; next} $0 in a' colors_1.txt colors_2.txt
light blue
yellow

# lines from colors_2.txt not present in colors_1.txt
$ awk 'NR==FNR{a[$0]; next} !($0 in a)' colors_1.txt colors_2.txt
black
dark green

Note that the NR==FNR logic will fail if the first file is empty, since NR wouldn't get a chance to increment. You can set a flag after the first file has been processed to avoid this issue. See this unix.stackexchange thread for more workarounds.
# no output
$ awk 'NR==FNR{a[$0]; next} !($0 in a)' /dev/null <(seq 2)

# gives the expected output
$ awk '!f{a[$0]; next} !($0 in a)' /dev/null f=1 <(seq 2)
1
2

Here's an example of comparing specific fields instead of whole lines. When you use a , separator between strings to construct the array key, the value of SUBSEP is inserted. This special variable has a default value of the non-printing character \034 which is usually not used as part of text files.

$ cat marks.txt
Dept    Name    Marks
ECE     Raj     53
ECE     Joel    72
EEE     Moi     68
CSE     Surya   81
EEE     Tia     59
ECE     Om      92
CSE     Amy     67

$ cat dept_name.txt
EEE Moi
CSE Amy
ECE Raj

$ awk 'NR==FNR{a[$1,$2]; next} ($1,$2) in a' dept_name.txt marks.txt
ECE     Raj     53
EEE     Moi     68
CSE     Amy     67

Video demo:

See also my CLI text processing with GNU awk ebook.

Interactive exercises for GNU grep, sed and awk (TUI apps)

2023-08-17T00:00:00+00:00

Having an interactive program that automatically loads questions and checks the solution is immensely helpful to have while learning a topic. I've written TUI apps with plenty of beginner to intermediate level exercises for GNU grep, GNU sed and GNU awk.

Installation🔗

For the past few months, I've been using a Python framework called Textual to create interactive TUI apps.

You'll need Python for this. This app is available on PyPI as grepexercises, sedexercises and awkexercises. Example installation instructions are shown below, adjust them based on your preferences and OS.

# virtual environment
$ python3 -m venv textual_apps
$ cd textual_apps
$ source bin/activate
$ pip install grepexercises sedexercises awkexercises

# launch the app, example shown for the grep command
$ grepexercises

To run the app without having to enter the virtual environment again, add aliases to .bashrc (or equivalent):

# you'll have to change the path
alias grepexercises='/path/to/textual_apps/bin/grepexercises'

# similarly, you can add aliases for the other apps as well

As an alternative to manually managing such virtual environments, you can use https://github.com/pypa/pipx instead:

$ pipx install grepexercises sedexercises awkexercises
$ awkexercises

As yet another alternative, you can install textual==0.85.2 (see Textual documentation for more details), clone my TUI-apps repository and run the Python file from respective folders. For example, grep_exercises.py for the grep command.

Adjust the terminal dimensions for the widgets to appear properly, for example 84x25 (characters x lines).

You can use alternative CLI tools to solve these exercises as well. For example, perl instead of GNU awk or ripgrep instead of GNU grep and so on.

Brief Guide🔗

You can either click the buttons using mouse or press the key combinations listed below:

Press F1 to view the complete guide from within the app itself.
Press Ctrl+p and Ctrl+n to navigate the questions list.
Type the command in the box below the question.
Press Enter to execute the command.
- Output would be displayed below the command box.
- If the output matches the expected results, the command box will turn green and reference solutions will also be shown.
- Issues due to errors and timeout (about 2 seconds) will be displayed in red.
Press Ctrl+s to toggle the reference solution box.
Press Ctrl+t to toggle between light and dark themes.
Press Ctrl+q to quit the app.
Some basic readline-like shortcuts are supported, for example Ctrl+u, Ctrl+k, Ctrl+w, etc

Your progress is automatically saved when you close the app and restored when you launch it again later. Already answered questions will be skipped.

There is no safeguard against the command you are executing. They are treated as if you typed them from a shell session.

Ebooks🔗

The exercise questions in these apps have been adapted from my programming ebooks: https://learnbyexample.github.io/books/

Feedback🔗

I'd highly appreciate your feedback. Please file an issue if there are bugs, crashes, etc.

Hope you find these TUI apps useful. Happy learning :)

Python tip 32: positive lookarounds

2023-08-16T00:00:00+00:00

Lookarounds help to create custom anchors and add conditions within a regex definition. These assertions are also known as zero-width patterns because they add restrictions similar to anchors and are not part of the matched portions. Negative lookarounds were discussed in this post. The syntax for positive lookarounds is shown below:

(?=pat) positive lookahead assertion
(?<=pat) positive lookbehind assertion

Here are some examples:

>>> s = '42 apple-5, fig3; x-83, y-20: f12'

# extract digits only if it is followed by ,
# note that end of string doesn't qualify as this is a positive assertion
>>> re.findall(r'\d+(?=,)', s)
['5', '83']

# extract digits only if it is preceded by - and followed by ; or :
>>> re.findall(r'(?<=-)\d+(?=[:;])', s)
['20']

# replace 'par' as long as 'part' occurs as a whole word later in the line
>>> re.sub(r'par(?=.*\bpart\b)', '[\g<0>]', 'par spare part party')
'[par] s[par]e part party'

With lookbehind assertion (both positive and negative), the pattern used for the assertion cannot imply matching variable length of text. Fixed length quantifier is allowed. Different length alternations are not allowed, even if the individual alternations are of fixed length.

>>> s = 'pore42 tar3 dare7 care5'

# not allowed
>>> re.findall(r'(?<=tar|dare)\d+', s)
re.error: look-behind requires fixed-width pattern

# workaround for r'(?<!tar|dare)\d+'
>>> re.findall(r'(?<!tar)(?<!dare)\d+', s)
['42', '5']

# workaround for r'(?<=tar|dare)\d+'
>>> re.findall(r'(?:(?<=tar)|(?<=dare))\d+', s)
['3', '7']

The third-party regex module (https://pypi.org/project/regex/) offers advanced features like variable-length lookbehinds, subexpression calls, etc.

Video demo:

See also my 100 Page Python Intro and Understanding Python re(gex)? ebooks.

Vim tip 30: some general Vim settings

2023-08-08T00:00:00+00:00

Here are some general Vim settings that you can put in the vimrc file to customize your editor. See :h options.txt for complete reference.

set history=200 increase default history from 50 to 200
- there are separate history lists for : commands, search patterns, etc
set nobackup disable backup files
set noswapfile disable swap files
colorscheme murphy a dark theme
- you can use :colorscheme followed by a space and then press Tab or Ctrl+d to get a list of the available color schemes
set showcmd show partial Normal mode command on Command-line and character/line/block-selection for Visual mode
set wildmode=longest,list,full use Bash-like tab completion
- first tab will complete as much as possible
- second tab will provide a list
- third and subsequent tabs will cycle through the completion options

:h 'history' will give you the documentation for the given option (note the use of single quotes).

You can use these settings from the Command-line mode as well, but will be active for the current Vim session only. Settings specified in the vimrc file will be loaded automatically at startup.

Video demo:

CLI tip 31: concatenate files column wise

2023-08-01T00:00:00+00:00

The paste command is typically used to merge two or more files column wise. By default, paste adds a tab character between corresponding lines of input files.

$ cat colors_1.txt
Blue
Brown
Orange
Purple
$ cat colors_2.txt
Black
Blue
Green
Orange

$ paste colors_1.txt colors_2.txt
Blue    Black
Brown   Blue
Orange  Green
Purple  Orange

You can use the -d option to change the delimiter between the columns. The separator is added even if the data has been exhausted for some of the input files.

$ paste -d'|' <(seq 3) <(seq 4 5) <(seq 6 8)
1|4|6
2|5|7
3||8

# note that the space between -d and empty string is necessary here
$ paste -d '' <(seq 3) <(seq 6 8)
16
27
38

# use newline separator to interleave file contents
$ paste -d'\n' <(seq 11 12) <(seq 101 102)
11
101
12
102

You can use empty files to get multicharacter separation between the columns. The pr command is better suited for this task.

$ paste -d' : ' <(seq 3) /dev/null /dev/null <(seq 4 6)
1 : 4
2 : 5
3 : 6

$ pr -mts' : ' <(seq 3) <(seq 4 6)
1 : 4
2 : 5
3 : 6

Video demo:

See paste command chapter from my Command line text processing with GNU Coreutils ebook for more details.

Python tip 31: next() function

2023-07-25T00:00:00+00:00

The next() builtin function can be used on an iterator (but not iterables) to retrieve the next item. Once you have exhausted an iterator, trying to get another item will result in a StopIteration exception. Here's an example:

>>> names = (m for m in dir(tuple) if '__' not in m)

>>> next(names)
'count'
>>> next(names)
'index'
>>> next(names)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

Here's a practical example to get a random item from a list without repetition:

>>> import random 

>>> names = ['Jo', 'Ravi', 'Joe', 'Raj', 'Jon']
>>> random.shuffle(names)

>>> random_name = iter(names)
>>> next(random_name)
'Jon'
>>> next(random_name)
'Ravi'

You can set a default value to be returned instead of the StopIteration exception. Here's an example:

>>> letters = iter('fig')

>>> next(letters, 'a')
'f'
>>> next(letters, 'a')
'i'
>>> next(letters, 'a')
'g'
>>> next(letters, 'a')
'a'
>>> next(letters, 'a')
'a'

Video demo:

See also my 100 Page Python Intro ebook.

Vim tip 29: greedy quantifiers

2023-07-19T00:00:00+00:00

Quantifiers can be applied to literal characters, dot metacharacter, groups, backreferences and character classes.

* match zero or more times
- abc* matches ab or abc or abccc or abcccccc but not bc
- Error.*valid matches Error: invalid input but not valid Error
- s/a.*b/X/ replaces table bottle bus with tXus since a.*b matches from the first a to the last b
\+ match one or more times
- abc\+ matches abc or abccc but not ab or bc
\? match zero or one times
- \= can also be used, helpful if you are searching backwards with the ? command
- abc\? matches ab or abc. This will match abccc or abcccccc as well, but only the abc portion
- s/abc\?/X/ replaces abcc with Xc
\{m,n} match m to n times (inclusive)
- ab\{1,4}c matches abc or abbc or xabbbcz but not ac or abbbbbc
\{m,} match at least m times
- ab\{3,}c matches xabbbcz or abbbbbc but not ac or abc or abbc
\{,n} match up to n times (including 0 times)
- ab\{,2}c matches abc or ac or abbc but not xabbbcz or abbbbbc
\{n} match exactly n times
- ab\{3}c matches xabbbcz but not abbc or abbbbbc

Greedy quantifiers will consume as much as possible, provided the overall pattern is also matched. That's how the Error.*valid example worked. If .* had consumed everything after Error, there wouldn't be any more characters to try to match valid. How the regexp engine handles matching varying amount of characters depends on the implementation details (backtracking, NFA, etc).

See :h pattern-overview for more details.

If you are familiar with other regular expression flavors like Perl, Python, etc, you'd be surprised by the use of \ in the above examples. If you use \v very magic modifier, the \ won't be needed.

Video demo:

CLI tip 30: extract only the matching portions

2023-07-11T00:00:00+00:00

The grep command provides the -o option to extract only the matching portions. Here are some examples using the BRE/ERE regexp flavors:

# whole words made up of lowercase alphabets and digits only
$ s='coat Bin food Apple (tar12) best fig_42'
$ echo "$s" | grep -owE '[a-z0-9]+'
coat
food
tar12
best

# extract characters from the start of string based on a delimiter
$ echo 'apple:123:banana:cherry' | grep -o '^[^:]*'
apple

# sequence of characters surrounded by double quotes
$ echo 'I like "mango" and "guava"' | grep -oE '"[^"]+"'
"mango"
"guava"

# whole words that have at least one consecutive repeated character
$ s='effort flee facade oddball rat tool'
$ echo "$s" | grep -owE '\w*(\w)\1\w*'
effort
flee
oddball
tool

And here are some examples with the PCRE flavor:

# numbers >= 100 if there are leading zeros
# same as: grep -owE '0*[1-9][0-9]{2,}'
$ echo '0501 035 154 12 26 98234' | grep -woP '0*+\d{3,}'
0501
154
98234

# extract digits only if it is preceded by - and not followed by ,
$ s='42 apple-5, fig3; x-83, y-20: f12'
$ echo "$s" | grep -oP '(?<=-)\d++(?!,)'
20

# extract digits that follow =
$ echo 'apple=42, fig=314' | grep -oP '=\K\d+'
42
314

# all digits and optional hyphen combo from the start of string
$ echo '123-87-593 42 apple-12-345' | grep -oP '\G\d+-?'
123-
87-
593

# all words except those surrounded by double quotes
$ s='I like2 "mango" and "guava"'
$ echo "$s" | grep -oP '"[^"]+"(*SKIP)(*F)|\w+'
I
like2
and

Use ripgrep if you want to add some more text to the matching portions, or perhaps you need to handle multiple capture groups. Here's an example:

$ echo 'apple=42, fig=314' | rg -o '(\w+)=(\d+)' -r '$2:$1'
42:apple
314:fig

Video demo:

See my CLI text processing with GNU grep and ripgrep ebook if you are interested in learning about the GNU grep and ripgrep commands in more detail.

Python tip 30: zip() function

2023-07-04T00:00:00+00:00

You can use the zip() builtin function to iterate over two or more iterables simultaneously. In every iteration, you'll get a tuple with an item from each of the iterables. Here's an example:

>>> names = ['Joe', 'Mei', 'Rose', 'Ram']
>>> physics = [86, 91, 76, 80]
>>> maths = [77, 92, 81, 83]

>>> for n, p, m in zip(names, physics, maths):
...     print(f'{n:5}: {p},{m}')
... 
Joe  : 86,77
Mei  : 91,92
Rose : 76,81
Ram  : 80,83

Here are some examples using list comprehensions and generator expressions:

>>> p = [1, 3, 5]
>>> q = [3, 214, 53]

>>> [i + j for i, j in zip(p, q)]
[4, 217, 58]

# inner product
>>> sum(i * j for i, j in zip(p, q))
910

By default, zip() will silently stop when the shortest iterable is exhausted:

>>> fruits = ('apple', 'banana', 'fig', 'guava')
>>> qty = (100, 25, 42)

>>> for f, q in zip(fruits, qty):
...     print(f'{f:6}: {q}')
... 
apple : 100
banana: 25
fig   : 42

The strict keyword argument was added in the Python 3.10 version. When set to True, this will raise an exception if the iterables are not of the same length:

>>> for f, q in zip(fruits, qty, strict=True):
...     print(f'{f:6}: {q}')
... 
apple : 100
banana: 25
fig   : 42
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: zip() argument 2 is shorter than argument 1

Video demo:

See also my 100 Page Python Intro ebook.

CLI text processing with GNU sed book announcement

2023-06-29T00:00:00+00:00

Hello!

I am pleased to announce a new version of my CLI text processing with GNU sed ebook. Examples, exercises, solutions, descriptions and external links were added/updated/corrected.

This book will help you learn the GNU sed command step-by-step from beginner to advanced levels with hundreds of examples and exercises. In addition to command options, regular expressions will also be discussed in detail.

Release offers🔗

To celebrate the new release, you can download PDF/EPUB versions of CLI text processing with GNU sed for FREE till 10-July-2023. You can still pay if you wish ;)

Other offers:

CLI text processing with GNU grep and ripgrep is FREE
All Books Bundle is $12 (normal price $32) — all my 13 programming ebooks

What's new?🔗

Command version updated to GNU sed 4.9
Many more exercises added, and you can practice some of them using this interactive TUI app
Long sections split into smaller ones
In general, many of the examples, exercises, solutions, descriptions and external links were updated/corrected
Updated Acknowledgements section
Code snippets related to info/warning sections will now appear as a single block
Book title changed to CLI text processing with GNU sed
New cover image
Images centered for EPUB format

Videos🔗

On this blog, I post tips covering Python, command line tools and Vim. Here are video demos for these tips:

Interactive TUI app🔗

I also wrote an interactive TUI app based on some of the exercises from the ebook. Reference solutions are also provided.

Table of Contents🔗

Preface
Introduction
In-place file editing
Selective editing
BRE/ERE Regular Expressions
Flags
Shell substitutions
z, s and f command line options
append, change, insert
Adding content from file
Control structures
Processing lines bounded by distinct markers
Gotchas and Tricks
Further Reading

Web version🔗

You can also read the book online here: https://learnbyexample.github.io/learn_gnused/.

GitHub repo🔗

Visit https://github.com/learnbyexample/learn_gnused for markdown source, example files, exercise solutions, sample chapters and other details related to the book.

See also my blog post on how to customize pandoc for generating beautiful PDF/EPUB versions from GitHub style markdown.

Subscribe to learnbyexample weekly — free newsletter covering programming resources, updates on what I am creating, tips, tools, free ebooks and more, delivered every Friday.

Feedback and Errata🔗

You can reach me via:

Issue Manager: https://github.com/learnbyexample/learn_gnused/issues
E-mail: echo 'bGVhcm5ieWV4YW1wbGUubmV0QGdtYWlsLmNvbQo=' | base64 --decode
Twitter: https://twitter.com/learn_byexample

Happy learning :)

Vim tip 28: miscellaneous motion and reposition commands

2023-06-26T00:00:00+00:00

Moving within the visible window:

H move to the first non-blank character of the top (home) line of the visible window
M move to the first non-blank character of the middle line of the visible window
L move to the first non-blank character of the bottom (low) line of the visible window

Reposition the current line:

Ctrl+e scroll up by a line
Ctrl+y scroll down by a line
zz reposition the current line to the middle of the visible window
- useful to see context around lines that are nearer to the top/bottom of the visible window
zt reposition the current line to the top of the visible window
zb reposition the current line to the bottom of the visible window

See :h 'scrolloff' option if you want to always show context around the current line.

Video demo:

CLI tip 29: define fields using FPAT in GNU awk

2023-06-20T00:00:00+00:00

In awk, the FS variable allows you to define the input field separator. In contrast, FPAT (field pattern) allows you to define what should the fields be made up of.

$ s='Sample123string42with777numbers'
# one or more consecutive digits
$ echo "$s" | awk -v FPAT='[0-9]+' '{print $2}'
42

$ s='coat Bin food tar12 best Apple fig_42'
# whole words made up of lowercase alphabets and digits only
$ echo "$s" | awk -v FPAT='\\<[a-z0-9]+\\>' -v OFS=, '{$1=$1} 1'
coat,food,tar12,best

$ s='items: "apple" and "mango"'
# get the first double quoted item
$ echo "$s" | awk -v FPAT='"[^"]+"' '{print $1}'
"apple"

FPAT is often used for CSV input where fields can contain embedded delimiter characters. For example, a field content "fox,42" when , is the delimiter.

$ s='eagle,"fox,42",bee,frog'

# simply using , as separator isn't sufficient
$ echo "$s" | awk -F, '{print $2}'
"fox

For such simpler CSV input, FPAT helps to define fields as starting and ending with double quotes or containing non-comma characters.

# * is used instead of + to allow empty fields
$ echo "$s" | awk -v FPAT='"[^"]*"|[^,]*' '{print $2}'
"fox,42"

The above will not work for all kinds of CSV files, for example if fields contain escaped double quotes, newline characters, etc. See stackoverflow: What's the most robust way to efficiently parse CSV using awk? for such cases. You could also use other programming languages such as Perl, Python, Ruby, etc which come with standard CSV parsing libraries or have easy access to third party solutions. There are also specialized command line tools such as xsv.

Video demo:

See also my CLI text processing with GNU awk ebook.

Python tip 29: negative lookarounds

2023-06-13T00:00:00+00:00

Lookarounds help to create custom anchors and add conditions within a regex definition. These assertions are also known as zero-width patterns because they add restrictions similar to anchors and are not part of the matched portions. The syntax for negative lookarounds is shown below:

(?!pat) negative lookahead assertion
(?<!pat) negative lookbehind assertion

Here are some examples:

# change 'cat' only if it is not followed by a digit character
# note that the end of string satisfies the given assertion
# 'catcat' has two matches as the assertion doesn't consume characters
>>> re.sub(r'cat(?!\d)', 'dog', 'hey cats! cat42 cat_5 catcat')
'hey dogs! cat42 dog_5 dogdog'

# change 'cat' only if it is not preceded by _
# note how 'cat' at the start of string is matched as well
>>> re.sub(r'(?<!_)cat', 'dog', 'cat _cat 42catcat')
'dog _cat 42dogdog'

# change whole word only if it is not preceded by : or -
>>> re.sub(r'(?<![:-])\b\w+', 'X', ':cart <apple: -rest ;tea')
':cart <X: -rest ;X'

Lookarounds can be placed anywhere and multiple lookarounds can be combined in any order. They do not consume characters nor do they play a role in matched portions. They just let you know whether the condition you want to test is satisfied from the current location in the input string.

# extract all whole words that do not start with a/n
>>> ip = 'a_t row on Urn e note Dust n end a2-e|u'
>>> re.findall(r'(?![an])\b\w+', ip)
['row', 'on', 'Urn', 'e', 'Dust', 'end', 'e', 'u']

# since the three assertions used here are all zero-width,
# all of the 6 possible combinations will be equivalent
>>> re.sub(r'(?!\Z)\b(?<!\A)', ' ', 'output=num1+35*42/num2')
'output = num1 + 35 * 42 / num2'

Video demo:

See also my 100 Page Python Intro and Understanding Python re(gex)? ebooks.

Vim tip 27: regexp anchors

2023-06-05T00:00:00+00:00

By default, regexp matches anywhere in the text. You can use line and word anchors to specify additional restrictions regarding the position of matches. These restrictions are made possible by assigning special meaning to certain characters (metacharacters) and escape sequences.

^ restricts the match to the start-of-line
- ^This matches This is a sample but not Do This
$ restricts the match to the end-of-line
- )$ matches apple (5) but not def greeting():
^$ match empty line
\<pattern restricts the match to the start of a word
- word characters include alphabets, digits and underscore
- \<his matches his or to-his or history but not this or _hist
pattern\> restricts the match to the end of a word
- his\> matches his or to-his or this but not history or _hist
\<pattern\> restricts the match between start of a word and end of a word
- \<his\> matches his or to-his but not this or history or _hist

End-of-line can be \r (carriage return), \n (newline) or \r\n depending on your system and fileformat setting.

See :h pattern-atoms for more details.

Video demo:

CLI tip 28: substitute specific occurrence with GNU sed

2023-05-30T00:00:00+00:00

By using the g flag with the s command (substitute), you can search and replace all occurrences of a pattern. Without the g flag, only the first matching portion will be replaced.

Did you know that you can use a number as a flag to replace only that particular occurrence of matching parts?

# replace only the third occurrence of ':' with '---'
$ echo 'apple:banana:cherry:fig:mango' | sed 's/:/---/3'
apple:banana:cherry---fig:mango

# replace only the second occurrence of a word starting with 't'
$ echo 'book table bus car banana tap camp' | sed 's/\bt\w*/"&"/2'
book table bus car banana "tap" camp

To replace a specific occurrence from the end of the line, you'll have use regular expression tricks:

$ s='apple:banana:cherry:fig:mango'

# replace the last occurrence
$ echo "$s" | sed -E 's/(.*):/\1---/'
apple:banana:cherry:fig---mango

# replace the last but one occurrence
$ echo "$s" | sed -E 's/(.*):(.*:)/\1---\2/'
apple:banana:cherry---fig:mango

# generic formula, where {N} refers to the last but Nth occurrence
$ echo "$s" | sed -E 's/(.*):((.*:){2})/\1---\2/'
apple:banana---cherry:fig:mango

If you combine a number flag with the g flag, all matches from that particular occurrence will be replaced.

# replace except the first occurrence of a word starting with 'b'
$ echo 'book table bus car banana tap camp' | sed 's/\bb\w*/"&"/2g'
book table "bus" car "banana" tap camp

Video demo:

See also my CLI text processing with GNU sed ebook.

Python tip 28: string concatenation and repetition

2023-05-24T00:00:00+00:00

Python provides a wide variety of features to work with strings. In this tip, you'll learn about string concatenation and repetition.

The + operator is one of the ways to concatenate two strings. The operands can be any expression that results in a string value and you can use any of the different ways to specify a string literal. Another option is to use f-strings. Here are some examples:

>>> s1 = 'hello'
>>> s2 = 'world'
>>> print(s1 + ' ' + s2)
hello world

>>> f'{s1} {s2}'
'hello world'

>>> s1 + r'. 1\n2'
'hello. 1\\n2'

Another way to concatenate is to simply place any kind of string literal next to each other. You can use zero or more whitespaces between the two literals. But you cannot mix an expression and a string literal. If the strings are inside parentheses, you can also use newline characters to separate the literals and optionally use comments.

>>> 'hello' r'. 1\n2'
'hello. 1\\n2'

>>> print('apple'
...       '-banana'
...       '-cherry')
apple-banana-cherry

You can repeat a string by using the * operator between a string and an integer. You'll get an empty string if the integer value is less than 1.

>>> style_char = '-'
>>> print(style_char * 50)
--------------------------------------------------

>>> word = 'buffalo '
>>> print(8 * word)
buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo

Video demo:

See also my 100 Page Python Intro ebook.

Vim tip 26: executing shell commands

2023-05-16T00:00:00+00:00

You can execute external commands from within Vim. Here are some examples:

:!ls execute the given shell command and display output
- the results are displayed as part of an expanded Command-line area, doesn't change contents of the file
:.!date replace the current line with the output of the given command
- pressing !! in Normal mode will also result in :.!
- ! waits for motion similar to d and y commands, !G will give :.,$!
:%!sort sort all the lines
- recall that % is a shortcut for the range 1,$
- note that this executes an external command, not the built-in :sort command
:3,8!sort sort only lines 3 to 8
:r!date insert output of the given command below the current line
:r report.log insert contents of the given file below the current line
- Note that ! is not used here since there is no shell command
:.!grep '^Help ' % replace the current line with all the lines starting with Help in the current file
- % here refers to current file contents
:sh open a shell session within Vim
- use exit command to quit the session

See :h :!, :h :sh and :h :r for more details.

Video demo:

CLI text processing with GNU grep and ripgrep book announcement

2023-05-11T00:00:00+00:00

Hello!

I am pleased to announce a new version of my CLI text processing with GNU grep and ripgrep ebook. Examples, exercises, solutions, descriptions and external links were added/updated/corrected. The chapter on ripgrep was changed significantly to focus mostly on the differences compared to GNU grep.

This book will help you learn these commands step-by-step from beginner to advanced levels with hundreds of examples and exercises.

Release offers🔗

To celebrate the new release, you can download PDF/EPUB versions of CLI text processing with GNU grep and ripgrep for FREE till 21-May-2023. You can still pay if you wish ;)

Other offers:

Computing from the Command Line is FREE — Linux command line tools and Shell Scripting for beginner to intermediate level users
All Books Bundle is $12 (normal price $32) — all my 13 programming ebooks

What's new?🔗

Command versions updated to GNU grep 3.10 and ripgrep 13.0.0
Many more exercises added
PCRE chapter — added section for conditional grouping, corrected description and examples for \K, atomic grouping, etc
ripgrep chapter — options and regex section modified to present only differences compared to GNU grep, added details for more options such as --field-match-separator, improved recursive search section, etc
Long sections split into smaller ones
In general, many of the examples, exercises, solutions, descriptions and external links were updated/corrected
Updated Acknowledgements section
Code snippets related to info/warning sections will now appear as a single block
Book title changed to CLI text processing with GNU grep and ripgrep
New cover image
Images centered for EPUB format

Videos🔗

On this blog, I post tips covering Python, command line tools and Vim. Here are video demos for these tips:

Interactive TUI app🔗

I also wrote an interactive TUI app based on some of the exercises from the ebook. Reference solutions are provided for both GNU grep and ripgrep.

Table of Contents🔗

Preface
Introduction
Frequently used options
BRE/ERE Regular Expressions
Context matching
Recursive search
Miscellaneous options
Perl Compatible Regular Expressions
Gotchas and Tricks
ripgrep
Further Reading

Web version🔗

You can also read the book online here: https://learnbyexample.github.io/learn_gnugrep_ripgrep/.

GitHub repo🔗

Visit https://github.com/learnbyexample/learn_gnugrep_ripgrep for markdown source, example files, exercise solutions, sample chapters and other details related to the book.

See also my blog post on how to customize pandoc for generating beautiful PDF/EPUB versions from GitHub style markdown.

Subscribe to learnbyexample weekly — free newsletter covering programming resources, updates on what I am creating, tips, tools, free ebooks and more, delivered every Friday.

Feedback and Errata🔗

You can reach me via:

Issue Manager: https://github.com/learnbyexample/learn_gnugrep_ripgrep/issues
E-mail: echo 'bGVhcm5ieWV4YW1wbGUubmV0QGdtYWlsLmNvbQo=' | base64 --decode
Twitter: https://twitter.com/learn_byexample

Happy learning :)

CLI tip 27: reverse text line wise with tac

2023-05-09T00:00:00+00:00

You can use tac to reverse the input line wise. If you pass multiple input files, each file content will be reversed separately.

$ printf 'apple\nbanana\ncherry\nfig and honey\n' | tac
fig and honey
cherry
banana
apple

You can use the -s option to specify a different string to be used as the line separator (newline is the default separator). When the custom separator occurs before the content of interest, use the -b option to print those separators before the content in the output as well.

$ cat blocks.txt
%=%=
apple
banana
%=%=
1
2
3
%=%=
red
green

$ tac -b -s '%=%=' blocks.txt
%=%=
red
green
%=%=
1
2
3
%=%=
apple
banana

See CLI tip 8: extract from start of file until matching line for a practical example where reversing input content helps in constructing a solution.

Video demo:

See also tac section from my Command line text processing with GNU Coreutils ebook for more details and examples.

Python tip 27: enumerate() function

2023-05-03T00:00:00+00:00

When you use a for loop, you get one element per each iteration. If you need the index of the elements as well, use the enumerate() built-in function. You'll get a tuple value per each iteration, containing index (starting with 0 by default) and the value at that index.

>>> nums = [42, 3.14, -2, 1000]
>>> for t in enumerate(nums):
...     print(t)
... 
(0, 42)
(1, 3.14)
(2, -2)
(3, 1000)

>>> names = ['Jo', 'Joe', 'Jon']
>>> [(n1, n2) for i, n1 in enumerate(names) for n2 in names[i+1:]]
[('Jo', 'Joe'), ('Jo', 'Jon'), ('Joe', 'Jon')]

By setting the start argument, you can change the initial value of the index.

>>> items = ('car', 'table', 'book')
>>> for idx, val in enumerate(items, start=1):
...     print(f'{idx}: {val}')
... 
1: car
2: table
3: book

Video demo:

See also my 100 Page Python Intro ebook.

Vim tip 25: substitute flags

2023-04-25T00:00:00+00:00

Here are some of the flags you can use with the substitute command:

g replace all occurrences within a matching line
- by default, only the first matching portion will be replaced
c ask for confirmation before each replacement
i ignore case for searchpattern
I don't ignore case for searchpattern

These flags are applicable for the substitute command but not / or ? searches. Flags can also be combined, for example:

s/cat/Dog/gi replace every occurrence of cat with Dog
- Case is ignored, so Cat, cAt, CAT, etc are all valid matches
- Note that i doesn't affect the case of the replacement string

See :h s_flags for a complete list of flags and more details about them.

Video demo:

CLI tip 26: removing duplicate lines with GNU awk

2023-04-18T00:00:00+00:00

awk '!a[$0]++' is one of the most famous Awk one-liners. It eliminates line based duplicates while retaining input order. The following example shows it in action along with an illustration of how the logic works.

$ cat purchases.txt
coffee
tea
washing powder
coffee
toothpaste
tea
soap
tea

$ awk '{print +a[$0] "\t" $0; a[$0]++}' purchases.txt
0       coffee
0       tea
0       washing powder
1       coffee
0       toothpaste
1       tea
0       soap
2       tea

# only those entries with zero in the first column will be retained
$ awk '!a[$0]++' purchases.txt
coffee
tea
washing powder
toothpaste
soap

Removing field based duplicates is simple for single field comparison. Just change $0 to the required field number after setting the appropriate field separator.

$ cat duplicates.txt
brown,toy,bread,42
dark red,ruby,rose,111
blue,ruby,water,333
dark red,sky,rose,555
yellow,toy,flower,333
white,sky,bread,111
light red,purse,rose,333

# based on the last field
$ awk -F, '!seen[$NF]++' duplicates.txt
brown,toy,bread,42
dark red,ruby,rose,111
blue,ruby,water,333
dark red,sky,rose,555

For multiple fields comparison, separate the fields with , so that SUBSEP is used to combine the field values to generate the key. SUBSEP has a default value of \034 which is a non-printing character and not usually used in text files.

# based on the first and third fields
$ awk -F, '!seen[$1,$3]++' duplicates.txt
brown,toy,bread,42
dark red,ruby,rose,111
blue,ruby,water,333
yellow,toy,flower,333
white,sky,bread,111
light red,purse,rose,333

huniq is a faster alternative for removing line based duplicates.

Video demo:

See also my CLI text processing with GNU awk ebook.

Python tip 22: possessive quantifiers

2023-04-13T00:00:00+00:00

Until Python 3.10, you had to use alternatives like the third-party regex module for possessive quantifiers and atomic grouping. The re module supports these features from Python 3.11 version.

Greedy quantifiers will match as much as possible but will backtrack to help the overall pattern to succeed. Possessive quantifiers behave like greedy but won't backtrack.

Suppose you want to match integer numbers greater than or equal to 100 where these numbers can optionally have leading zeros.

>>> numbers = '42 314 001 12 00984'

# this solution fails because 0* and \d{3,} can both match leading zeros
# and greedy quantifiers will give up characters to help overall regex succeed
>>> re.findall(r'0*\d{3,}', numbers)
['314', '001', '00984']

# here 0*+ will not give back leading zeros after they are consumed
>>> re.findall(r'0*+\d{3,}', numbers)
['314', '00984']

# workaround if possessive quantifiers are not supported
>>> re.findall(r'0*[1-9]\d{2,}', numbers)
['314', '00984']

Here's another example. The goal is to match lines whose first non-whitespace character is not a # character. A matching line should have at least one non-# character, so empty lines and those with only whitespace characters should not match.

>>> lines = ['#cmt', 'c = "#"', '\t #comment', 'abc', '', ' \t ']

# this solution fails because \s* can backtrack
# and [^#] can match a whitespace character as well
>>> [e for e in lines if re.match(r'\s*[^#]', e)]
['c = "#"', '\t #comment', 'abc', ' \t ']

# this works because \s*+ will not give back any whitespace characters
>>> [e for e in lines if re.match(r'\s*+[^#]', e)]
['c = "#"', 'abc']

# workaround if possessive quantifiers are not supported
>>> [e for e in lines if re.match(r'\s*[^#\s]', e)]
['c = "#"', 'abc']

See my blog post on possessive quantifiers and atomic grouping for more examples, details about catastrophic backtracking and so on.

Video demo:

See also my 100 Page Python Intro and Understanding Python re(gex)? ebooks.

Python tip 26: atomic grouping

2023-04-13T00:00:00+00:00

Until Python 3.10, you had to use alternatives like the third-party regex module for possessive quantifiers and atomic grouping. The re module supports these features from Python 3.11 version.

Greedy and non-greedy quantifiers will backtrack to help the overall pattern to succeed. The syntax for an atomic group is (?>pat), where pat is the pattern you want to safeguard from further backtracking. You can think of it as a special group that is isolated from the other parts of the regular expression.

Here's an example with greedy quantifier:

>>> import re
>>> numbers = '42 314 001 12 00984'

# 0* is greedy and the (?>) grouping prevents backtracking
# same as: re.findall(r'0*+\d{3,}', numbers)
>>> re.findall(r'(?>0*)\d{3,}', numbers)
['314', '00984']

Here's an example with non-greedy quantifier:

>>> ip = 'fig::mango::pineapple::guava::apples::orange'

# this matches from the first '::' to the first occurrence of '::apple'
>>> re.search(r'::.*?::apple', ip)[0]
'::mango::pineapple::guava::apple'

# '(?>::.*?::)' will match only from '::' to the very next '::'
# '::mango::' fails because 'apple' isn't found afterwards
# similarly '::pineapple::' fails
# '::guava::' succeeds because it is followed by 'apple'
>>> re.search(r'(?>::.*?::)apple', ip)[0]
'::guava::apple'

Video demo:

See also my 100 Page Python Intro and Understanding Python re(gex)? ebooks.

Vim tip 24: movement commands within the current file

2023-04-04T00:00:00+00:00

Here are some commands you can use in Normal mode to move within the current file:

gg move to the first non-blank character of the first line
G move to the first non-blank character of the last line
5G move to the first non-blank character of the fifth line
- As an alternative, you can use :5 followed by Enter key (Command-line mode)
50% move to the halfway point
- you can use other percentages as needed
% move to matching pair of brackets like (), {} and []
- This will work across lines and nesting is taken into consideration as well
- If the cursor is on a non-bracket character and a bracket character is present later in the line, the % command will move to the matching pair of that character (which could be present in some other line too)
- Use the matchpairs option to customize the matching pairs. For example, :set matchpairs+=<:> will match <> as well

It is also possible to match a pair of keywords like HTML tags, if-else, etc with %. See :h matchit-install for details.

Video demo:

CLI tip 25: get file properties using the stat command

2023-03-28T00:00:00+00:00

The stat command is useful to get details like file type, size, inode, permissions, last accessed and modified timestamps, etc. You'll get all of these details by default. The -c and --printf options can be used to display only the required details in a particular format.

Here's an example to get accessed and modified timestamps of a file:

# sample directory and sample file
$ mkdir stat_examples && cd $_
$ printf 'long\nshot\n' > ip.txt

# %x gives the last accessed timestamp
$ stat -c '%x' ip.txt
2023-03-27 20:20:55.217530670 +0530

# modify the file
$ printf 'apple\nbanana\n' >> ip.txt
# %y gives the last modified timestamp
$ stat -c '%y' ip.txt
2023-03-27 20:21:50.298964283 +0530

Here's an example with some more file properties:

# %s gives file size in bytes
# \n is used to insert a newline
# %i gives the inode value
# same as: stat --printf='%s\n%i\n' ip.txt
$ stat -c $'%s\n%i' ip.txt
23
6438890

Here's an example for a linked file:

$ ln -s /usr/share/dict/words words.txt

# %N gives quoted filenames
# if input is a link, path it points to is also displayed
$ stat -c '%N' words.txt
'words.txt' -> '/usr/share/dict/words'

You can also pass multiple file arguments:

$ printf '#!/bin/bash\n\necho hi\n' > hi.sh

# %s gives file size in bytes
# %n gives filenames
$ stat -c '%s %n' ip.txt hi.sh
23 ip.txt
21 hi.sh

The stat command should be preferred instead of parsing ls -l output for file details. See mywiki.wooledge: avoid parsing output of ls and unix.stackexchange: why not parse ls? for explanation and other alternatives.

Video demo:

See also my Linux Command Line Computing ebook.

Python tip 25: split and partition string methods

2023-03-21T00:00:00+00:00

The split() method splits a string based on the given substring and returns a list. By default, whitespace is used for splitting and empty elements are discarded.

>>> greeting = '  \t\r\n have    a  nice \r\v\t day  \f\v\r\t\n '

>>> greeting.split()
['have', 'a', 'nice', 'day']

You can split the input based on a specific string literal by passing it as an argument. Here are some examples:

>>> creatures = 'dragon][unicorn][centaur'
>>> creatures.split('][')
['dragon', 'unicorn', 'centaur']

# empty elements will be preserved in this case
>>> ':car::jeep::'.split(':')
['', 'car', '', 'jeep', '', '']

The maxsplit argument allows you to restrict the number of times the input string should be split. Use rsplit() if you want to split from right to left.

# split once
>>> 'apple-grape-mango-fig'.split('-', maxsplit=1)
['apple', 'grape-mango-fig']

# match the rightmost occurrence
>>> 'apple-grape-mango-fig'.rsplit('-', maxsplit=1)
['apple-grape-mango', 'fig']
>>> 'apple-grape-mango-fig'.rsplit('-', maxsplit=2)
['apple-grape', 'mango', 'fig']

The partition() method will give a tuple of three elements — portion before the leftmost match, the separator itself and the portion after the split. You can use rpartition() to match the rightmost occurrence of the separator.

>>> marks = 'maths:85'
>>> marks.partition(':')
('maths', ':', '85')

# last two elements will be empty if there is no match
>>> marks.partition('=')
('maths:85', '', '')

# match the rightmost occurrence
>>> creatures = 'dragon][unicorn][centaur'
>>> creatures.rpartition('][')
('dragon][unicorn', '][', 'centaur')

See my Understanding Python re(gex)? ebook to learn about string splitting with regular expressions.

Video demo:

See also my 100 Page Python Intro ebook.

100+ Interactive Python Regex Exercises

2023-03-20T00:00:00+00:00

Having an interactive program that automatically loads questions and checks the solution is wonderful to have while learning a topic. This TUI app has beginner to advanced level exercises for Python regular expressions. There are more than 100 exercises covering both the builtin re and third-party regex module.

Installation🔗

This app is available on PyPI as regexexercises. Example installation instructions are shown below, adjust them based on your preferences and OS.

# virtual environment
$ python3 -m venv textual_apps
$ cd textual_apps
$ source bin/activate
$ pip install regexexercises

# launch the app
$ regexexercises

If you are on Windows, using the Windows Terminal is recommended. See this issue for Virtual Environment commands and other details.

To run the app without having to enter the virtual environment again, add this alias to .bashrc (or equivalent):

# you'll have to change the path
alias regexexercises='/path/to/textual_apps/bin/regexexercises'

As an alternative to manually managing such virtual environments, you can use https://github.com/pypa/pipx instead:

$ pipx install regexexercises
$ regexexercises

As yet another alternative, you can install textual==0.85.2 (see Textual documentation for more details), clone my TUI-apps repository and run the pyregex_exercises.py file.

Adjust the terminal dimensions for the widgets to appear properly, for example 84x25 (characters x lines).

Video demo🔗

Brief Guide🔗

Type your solution in the input box below the question.
- Use ip variable to represent the sample input.
- Any single valid Python expression will be accepted.
- Some basic readline-like shortcuts are supported, for example Ctrl+u, Ctrl+k, Ctrl+w, etc
Press Enter to execute the code.
- Output would be displayed below the command box.
- If the output matches the expected results, the solution box will turn green and a reference solution will also be shown.
- Error messages due to exceptions will be displayed in red.
Press Ctrl+p and Ctrl+n to navigate the questions list.
Press Ctrl+r to toggle between str and repr — helps to spot characters like tabs, newlines, backspaces, etc.
Press Ctrl+b to toggle between expected and actual — helps to debug incorrect solutions.
Press Ctrl+s toggle reference solution
Press Ctrl+t to toggle between light and dark themes.
Press Ctrl+q to quit the app.
Press F1 to view a detailed guide within the app itself and press F2 to get back to the exercises.

Your progress will be automatically saved and restored. Already answered questions will be skipped.

There is no safeguard against the code you are executing. They are treated as if you executed them from a Python program.

See app_guide.md for more detailed instructions.

Ebook🔗

See my Understanding Python re(gex)? ebook to learn regular expressions with hundreds of examples and exercises.

Feedback🔗

I'd highly appreciate your feedback. Please file an issue if there are bugs, crashes, etc.

Hope you find this TUI app useful. Happy learning :)

Vim tip 23: editing lines filtered by a pattern

2023-03-14T00:00:00+00:00

The syntax for g command (short for global) is shown below:

:[range]g[lobal]/{pattern}/[cmd]

This command is used to edit lines that are first filtered based on a searchpattern.

:g/call/d delete all lines containing call
- similar to the d Normal mode command, the deleted contents will be saved to the default " register
- :g/call/d a in addition to the default register, the deleted content will also be stored in the "a register
- :g/call/d _ deleted content won't be saved anywhere, since it uses the black hole register
:g/^#/t0 copy all lines starting with # to the start of the file
:1,5 g/call/d delete all lines containing call only for the first five lines
:g/cat/ s/animal/mammal/g replace animal with mammal only for the lines containing cat
:.,.+20 g/^#/ normal >> indent the current line and the next 20 lines only if the line starts with #
- Note the use of normal when you need to use Normal mode commands on the filtered lines
- Use normal! if you don't want user defined mappings to be considered

You can use g! or v to act on lines not satisfying the filtering condition.

:v/jump/d delete all lines not containing jump
- same as :g!/jump/d

In addition to the / delimiter, you can also use any single byte character other than alphabets, \, " or |.

See :h :g for more details.

Video demo:

Python Regular Expressions Playground

2023-03-11T00:00:00+00:00

This TUI application is intended as an interactive playground for Python Regular Expressions. The app also includes a comprehensive cheatsheet and several interactive examples.

Installation🔗

This app is available on PyPI as regexplayground. Example installation instructions are shown below, adjust them based on your preferences and OS.

# virtual environment
$ python3 -m venv textual_apps
$ cd textual_apps
$ source bin/activate
$ pip install regexplayground

# launch the app
$ regexplayground

To run the app without having to enter the virtual environment again, add this alias to .bashrc (or equivalent):

# you'll have to change the path
alias regexplayground='/path/to/textual_apps/bin/regexplayground'

As an alternative, you can install textual==0.85.2 (see Textual documentation for more details), clone my TUI-apps repository and run the pyregex_playground.py file.

Adjust the terminal dimensions for the widgets to appear properly, for example 84x25 (characters x lines). Here's another screenshot:

Brief Guide🔗

You can type the search pattern in the Compile input box and press the Enter key (or Ctrl+r) to execute. For example, re.compile(r'\d') to match digit characters. Matching portions will be highlighted in red.

The compiled pattern is available via the pat variable and you can use ip to refer to the input string. You can transform or extract data by typing appropriate expression in the Action box. For example, pat.sub(r'(\g<0>)', ip) will add parenthesis around the matching portions.

You can skip the Compile box and directly use the Action box too. For example, [m.span() for m in re.finditer(r'\d+', ip)] to get the location of all the matching portions.

There is no safeguard against the commands you have typed. They are treated as if you executed them from a Python program.

Press F1 to view the detailed guide from within the app, F2 to get back to the Playground from other screens, F3 to view a cheatsheet and F4 for interactive examples.

For more detailed instructions, see app guide.

Ebook🔗

See my Understanding Python re(gex)? ebook to learn regular expressions with hundreds of examples and exercises.

Feedback🔗

I'd highly appreciate your feedback. Please file an issue if there are bugs, crashes, etc.

Hope you find this TUI app useful. Happy learning :)

CLI tip 24: inserting file contents one line at a time

2023-03-07T00:00:00+00:00

The R command provided by GNU sed is very similar to r with respect to most of the rules seen in an earlier tip. But instead of reading entire file contents, R will read one line at a time from the source file when the given address matches. If entire file has already been read and another address matches, sed will proceed as if the line was empty.

Here's an example:

$ cat ip.txt
    * sky
    * apple
$ cat fav_colors.txt
deep red
yellow
reddish
brown

# add a line from 'ip.txt'
# whenever a line from 'fav_colors.txt' contains 'red'
$ sed '/red/R ip.txt' fav_colors.txt
deep red
    * sky
yellow
reddish
    * apple
brown

You can combine with other sed commands to solve various kind of problems. For example, to replace the matching lines:

# empty // will refer to the previously used regex, /red/ in this case
$ sed -e '/red/R ip.txt' -e '//d' fav_colors.txt
    * sky
yellow
    * apple
brown

And, here's how you can interleave contents of two files:

# /dev/stdin will get data from stdin (output of 'seq 4' here)
# same as: seq 4 | paste -d'\n' fav_colors.txt -
$ seq 4 | sed 'R /dev/stdin' fav_colors.txt
deep red
1
yellow
2
reddish
3
brown
4

# using 'paste' here will add a newline when stdin runs out of data
$ seq 2 | sed 'R /dev/stdin' fav_colors.txt
deep red
1
yellow
2
reddish
brown

Video demo:

See also my CLI text processing with GNU sed ebook.

Python tip 24: modifying list using insert and slice

2023-02-28T00:00:00+00:00

The insert() list method helps to insert an object before the given index. Negative indexing is also supported.

>>> books = ['Sourdough', 'Sherlock Holmes', 'Cradle']

# same as: books.insert(-1, 'The Martian')
>>> books.insert(2, 'The Martian')
>>> books
['Sourdough', 'Sherlock Holmes', 'The Martian', 'Cradle']

# index >= list-length will append the object at the end
>>> books.insert(1000, 'Legends & Lattes')
>>> books
['Sourdough', 'Sherlock Holmes', 'The Martian', 'Cradle', 'Legends & Lattes']

You can use slicing notation to modify one or more list elements. The list will automatically shrink or expand as needed. Here are some examples:

>>> nums = [1, 4, 6, 22, 3, 5]

# modify a single element
>>> nums[0] = 100
>>> nums
[100, 4, 6, 22, 3, 5]

# modify the last three elements
>>> nums[-3:] = [-1, -2, -3]
>>> nums
[100, 4, 6, -1, -2, -3]

# elements at index 1, 2 and 3 are replaced with a single object
>>> nums[1:4] = [2000]
>>> nums
[100, 2000, -2, -3]

# element at index 1 is replaced with multiple elements
>>> nums[1:2] = [3.14, 4.13, 6.78]
>>> nums
[100, 3.14, 4.13, 6.78, -2, -3]

RHS must be an iterable when you use slicing notation with :, even when LHS refers to a single element. For example, nums[1:2] = 100 is not valid.

Video demo:

See also my 100 Page Python Intro ebook.

CLI tip 23: recursive filename matching with globstar

2023-02-10T00:00:00+00:00

Enable the globstar option to recursively match filenames within a specified path. You can use shopt -s globstar and shopt -u globstar to set and unset this option respectively.

First, create some sample files:

$ mkdir test_globstar && cd $_
$ mkdir -p todos projects/{tictactoe,calculator}
$ touch ip.txt .hidden.txt report.log hello.py
$ touch todos/{books,outing}.txt
$ touch projects/tictactoe/game.py projects/calculator/{calc.sh,notes.txt}

Here are some examples:

$ shopt -s globstar

$ ls -1 **/*.txt
ip.txt
projects/calculator/notes.txt
todos/books.txt
todos/outing.txt

$ ls -1 **/*/*.txt
projects/calculator/notes.txt
todos/books.txt
todos/outing.txt

# assumes extglob is enabled
$ ls -1 **/*.@(py|sh)
hello.py
projects/calculator/calc.sh
projects/tictactoe/game.py

$ ls -1d **/
projects/
projects/calculator/
projects/tictactoe/
todos/

If you need to match hidden files as well, enable the dotglob option:
$ shopt -s dotglob
$ ls -1 **/*.txt
.hidden.txt
ip.txt
projects/calculator/notes.txt
todos/books.txt
todos/outing.txt

Video demo:

See also my Linux Command Line Computing ebook.

Vim tip 22: word and WORD motions

2023-02-10T00:00:00+00:00

Definitions from :h word and :h WORD are quoted below to explain the difference between word and WORD.

word A word consists of a sequence of letters, digits and underscores, or a sequence of other non-blank characters, separated with white space (spaces, tabs, <EOL>). This can be changed with the iskeyword option. An empty line is also considered to be a word.

WORD A WORD consists of a sequence of non-blank characters, separated with white space. An empty line is also considered to be a WORD.

word based motions:

w move to the start of the next word
b move to the beginning of the current word if the cursor is not at the start of word. Otherwise, move to the beginning of the previous word
e move to the end of the current word if cursor is not at the end of word. Otherwise, move to the end of next word
ge move to the end of the previous word
3w move 3 words forward
- Similarly, a number can be prefixed for all the other commands discussed here

WORD based motions:

W move to the start of the next WORD
- 192.1.168.43;hello is considered as a single WORD, but has multiple words
B move to the beginning of the current WORD if the cursor is not at the start of WORD. Otherwise, move to the beginning of the previous WORD
E move to the end of the current WORD if cursor is not at the end of WORD. Otherwise, move to the end of next WORD
gE move to the end of the previous WORD

All of these motions will work across lines. For example, if the cursor is on the last word of a line, pressing w will move to the start of the first word in the next line.

Video demo:

Python tip 23: map, filter and reduce

2023-02-07T00:00:00+00:00

Many operations on container objects can be defined in terms of these three concepts. For example, if you want to sum the square of all even numbers:

separating out even numbers is Filter (i.e. only elements that satisfy a condition are retained)
square of such numbers is Map (i.e. each element is transformed by a mapping function)
final sum is Reduce (i.e. you get one value out of multiple values)

One or more of these operations may be absent depending on the problem statement. Each of these steps will be first illustrated using straightforward code and then the equivalent list comprehensions (and generator expressions) are also shown.

The first of these steps could look like:

>>> def get_evens(iterable):
...     op = []
...     for n in iterable:
...         if n % 2 == 0:
...             op.append(n)
...     return op
... 
>>> nums = [100, 53, 32, 0, 11, 5, 2]
>>> get_evens(nums)
[100, 32, 0, 2]

>>> [n for n in nums if n % 2 == 0]
[100, 32, 0, 2]

The second step could be:

>>> def sqr_evens(iterable):
...     op = []
...     for n in iterable:
...         if n % 2 == 0:
...             op.append(n * n)
...     return op
... 
>>> sqr_evens(nums)
[10000, 1024, 0, 4]

>>> [n * n for n in nums if n % 2 == 0]
[10000, 1024, 0, 4]

And finally, the third step could be:

>>> def sum_sqr_evens(iterable):
...     total = 0
...     for n in iterable:
...         if n % 2 == 0:
...             total += n * n
...     return total
... 
>>> sum_sqr_evens(nums)
11028

>>> sum(n * n for n in nums if n % 2 == 0)
11028

You can also use map(), filter() and functools.reduce() for such problems.

Video demo:

See also my 100 Page Python Intro ebook.

Vim tip 21: working with tabs

2023-02-01T00:00:00+00:00

Multiple files can be opened in Vim within the same tab page and/or in different tabs. From :h windows-intro:

A buffer is the in-memory text of a file.

A window is a viewport on a buffer.

A tab page is a collection of windows.

:tabe filename open the given file in a new tab (:tabe is short for :tabedit)
- if filename isn't specified, you'll get an unnamed empty window
- by default, the new tab is opened to the right of the current tab
- :0tabe open as the first tab
- :$tabe open as the last tab
- see :h :tabe for more details and features

Switching between tabs:

:tabn switch to the next tab (:tabn is short for :tabnext)
- if tabs to the right are exhausted, switch to the first tab
- gt and Ctrl+Page Down can also be used
- 2gt switch to the second tab (the number specified is absolute, not relative)
:tabp switch to the previous tab (:tabp is short for :tabprevious)
- if tabs to the left are exhausted, switch to the last tab
- gT and Ctrl+Page Up can also be used
:tabr switch to the first tab (:tabr is short for :tabrewind)
- :tabfirst can also be used
:tabl switch to the last tab (:tabl is short for :tablast)

Moving tabs:

:tabm N move the current tab to after N tabs from the start (:tabm is short for :tabmove)
- :tabm 0 move the current tab to the beginning
- :tabm move the current tab to the end
:tabm +N move the current tab N positions to the right
:tabm -N move the current tab N positions to the left

Buffer list includes all the files opened in all the tabs. You can also use the mouse to switch/move tabs in GVim.

Video demo:

CLI tip 22: grep options to suppress stdout and stderr

2023-01-25T00:00:00+00:00

While writing scripts, sometimes you just need to know if a file contains the pattern and act based on the exit status of the command. Instead of redirecting the output to /dev/null you can use the -q option. This will avoid printing anything on stdout and also provides speed benefit as processing would be stopped as soon as the given condition is satisfied.

$ cat find.txt
The find command is more versatile than recursive options and
and extended globs. Apart from searching based on filename, it
has provisions to match based on the the file characteristics
like size and time.

$ grep -wE '(\w+) \1' find.txt
has provisions to match based on the the file characteristics
$ grep -qwE '(\w+) \1' find.txt
$ echo $?
0

$ grep -q 'xyz' find.txt
$ echo $?
1

$ grep -qwE '(\w+) \1' find.txt && echo 'Repeated words found!'
Repeated words found!

The -s option will suppress the error messages that are intended for the stderr stream.

# when the input file doesn't exist
$ grep 'in' xyz.txt
grep: xyz.txt: No such file or directory
$ grep -s 'in' xyz.txt
$ echo $?
2

# when sufficient permission is not available
$ touch new.txt
$ chmod -r new.txt
$ grep 'rose' new.txt
grep: new.txt: Permission denied
$ grep -s 'rose' new.txt
$ echo $?
2

Errors regarding regular expressions and invalid options will be on the stderr stream even when the -s option is used.

$ grep -sE 'a(' find.txt
grep: Unmatched ( or \(

$ grep -sE 'a(' find.txt 2> /dev/null
$ echo $?
2

Check out my ch command line tool for a practical example of using the -q option.

Video demo:

See my CLI text processing with GNU grep and ripgrep ebook if you are interested in learning about GNU grep and ripgrep commands in more detail.

Python Regex Surprises

2023-01-21T00:00:00+00:00

In this post, you'll find a few regular expression examples that might surprise you. Some are Python specific and some are applicable to other regex flavors as well. To make it more interesting, these are framed as questions for you to ponder upon. Answers are hidden by default.

Poster created using Canva

If you are not familiar with regular expressions, check out my Understanding Python re(gex)? ebook.

$ vs \Z🔗

Are the $ and \Z anchors equivalent?

Click to view answer

$ can match both the end of string and just before \n if it is the last character. \Z will only match the end of string.

>>> greeting = 'hi there\nhave a nice day\n'

>>> bool(re.search(r'day$', greeting))
True
>>> bool(re.search(r'day\n$', greeting))
True

>>> bool(re.search(r'day\Z', greeting))
False
>>> bool(re.search(r'day\n\Z', greeting))
True

Slicing vs start and end arguments🔗

Did you know that you can specify start and end index arguments for compiled methods?

Pattern.search(string[, pos[, endpos]])

Now, here's a conundrum:

>>> word_pat = re.compile(r'\Aat')

>>> bool(word_pat.search('cater'[1:]))
True

# what will be the output?
>>> bool(word_pat.search('cater', 1))

Click to view answer

Specifying a greater than 0 start index when using \A is always going to return False. This is because, as far as the search() method is concerned, only the search space has been narrowed — the anchor positions haven't changed. When slicing is used, you are creating an entirely new string object with new anchor positions.

Do ^ and $ match after the last newline?🔗

When you use the re.MULTILINE flag, the ^ and $ anchors will match at the start and end of every input line. Question is, will they also match after a newline character at the end of the input?

Click to view answer

Yes, they will both match after the last newline character.

>>> print(re.sub(r'(?m)^', 'apple ', '1\n2\n'))
apple 1
apple 2
apple 

>>> print(re.sub(r'(?m)$', ' banana', '1\n2\n'))
1 banana
2 banana
 banana

Word boundary vs lookarounds🔗

\b..\b is same as (?<!\w)..(?!\w) — True or False?

Click to view answer

False! \b matches both the start and end of word locations. In the below example, \b..\b doesn't necessarily mean that the first \b will match only the start of word location and the second \b will match only the end of word location. They can be any combination! For example, I followed by space in the input string here is using the start of word location for both the conditions. Similarly, space followed by 2 is using the end of word location for both the conditions.

In contrast, the negative lookarounds version ensures that there are no word characters around any two characters. Also, such assertions will always be satisfied at the start of string and the end of string respectively. But \b depends on the presence of word characters. For example, ! at the end of the input string here matches the lookaround assertion but not word boundary.

>>> ip = 'I have 12, he has 2!'

>>> re.sub(r'\b..\b', '{\g<0>}', ip)
'{I }have {12}{, }{he} has{ 2}!'

>>> re.sub(r'(?<!\w)..(?!\w)', '{\g<0>}', ip)
'I have {12}, {he} has {2!}'

Undefined escape sequences🔗

If you use undefined escape sequences like \e, will you get an error or will it match the unescaped character (e for this example`)?

Click to view answer

Python raises an exception for escape sequences that are not defined. Apart from sequences defined for character sets (for example \d, \w, \s, etc), these are allowed: \a \b \f \n \N \r \t \u \U \v \x \\ where \b means backspace only in character classes. Also, \u and \U are valid only in Unicode patterns.

>>> bool(re.search(r'\t', 'cat\tdog'))
True

>>> bool(re.search(r'\c', 'cat\tdog'))
re.error: bad escape \c at position 0

Using octal and hexadecimal escapes in the replacement section🔗

In string literals, you can use octal, hexadecimal and unicode escapes to represent a character. For example, '\174' is same as using '|'. Do you know which of these escapes you can use inside raw strings in the replacement section of the sub() function?

Click to view answer

Only octal escapes are allowed inside raw strings in the replacement section. If you are otherwise not using the \ character, then using normal strings in the replacement section is preferred as it will also allow hexadecimal and unicode escapes.

>>> re.sub(r',', r'\x7c', '1,2')
re.error: bad escape \x at position 0

>>> re.sub(r',', r'\174', '1,2')
'1|2'
>>> re.sub(r',', '\x7c', '1,2')
'1|2'

I feel like it would have been rather better if octal escapes were also not allowed. That would have allowed us to use \0 instead of \g<0> for backreferencing the entire matched portion in the replacement section.

Using escape sequences for metacharacters🔗

In the search section, if you use an escape (for example, \x7c to represent the | character), will it behave as the alternation metacharacter or match it literally?

>>> re.sub(r'2|3', '5', '12|30')
'15|50'

# what will be the output?
>>> re.sub(r'2\x7c3', '5', '12|30')

Click to view answer

The output will be '150' since escapes will be treated literally.

Empty matches🔗

You are likely to have come across this before:

# what will be the output?
>>> re.sub(r'[^,]*', r'{\g<0>}', ',cat,tiger')

Click to view answer

Can quantifiers be grouped out?🔗

Similar to a(b+c)d = abd+acd in maths, you get a(b|c)d = abd|acd in regular expressions. (a*|b*) is same as (a|b)* — True or False?

Click to view answer

Railroad diagram created using debuggex.com

False. Because (a*|b*) will match only sequences like a, aaa, bb, bbbbbbbb. But (a|b)* can match mixed sequences like ababbba too.

Portion captured by a quantified group🔗

This should be another familiar regex gotcha:

# what will be the output?
>>> re.sub(r'\A([^,]+,){3}([^,]+)', r'\1(\2)', '1,2,3,4,5,6,7')

Click to view answer

Referring to the text matched by a capture group with a quantifier will give only the last match, not the entire match. You'll need an outer capture group to get the entire matched portion.

>>> re.sub(r'\A([^,]+,){3}([^,]+)', r'\1(\2)', '1,2,3,4,5,6,7')
'3,(4),5,6,7'

>>> re.sub(r'\A((?:[^,]+,){3})([^,]+)', r'\1(\2)', '1,2,3,4,5,6,7')
'1,2,3,(4),5,6,7'

Character combinations🔗

\b[a-z](on|no)[a-z]\b is same as \b[a-z][on]{2}[a-z]\b — True or False?

Click to view answer

False. [on]{2} will also match oo and nn.

>>> words = 'known mood know pony inns'

>>> re.findall(r'\b[a-z](?:on|no)[a-z]\b', words)
['know', 'pony']
>>> re.findall(r'\b[a-z][on]{2}[a-z]\b', words)
['mood', 'know', 'pony', 'inns']

Greedy vs Possessive🔗

Suppose you want to match integer numbers greater than or equal to 100 where these numbers can optionally have leading zeros. Will the below code work? If not, what would you use instead?

>>> numbers = '42 314 001 12 00984'

# will this work?
>>> re.findall(r'0*\d{3,}', numbers)

Click to view answer

No. You can either modify the pattern such that 0* won't interfere or use possessive quantifiers to prevent backtracking.

>>> numbers = '42 314 001 12 00984'

# this solution fails because 0* and \d{3,} can both match leading zeros
# and greedy quantifiers will give up characters to help overall RE succeed
>>> re.findall(r'0*\d{3,}', numbers)
['314', '001', '00984']

# 0*+ is possessive, will never give back leading zeros
>>> re.findall(r'0*+\d{3,}', numbers)
['314', '00984']

# workaround if possessive isn't supported
>>> re.findall(r'0*[1-9]\d{2,}', numbers)
['314', '00984']

See my blog post on possessive quantifiers and atomic grouping for more examples, details about catastrophic backtracking and so on.

Optional flags argument🔗

Will the sub() function in the code sample below match case insensitively or not?

>>> re.findall(r'key', 'KEY portkey oKey Keyed', re.I)
['KEY', 'key', 'Key', 'Key']

# what will be the output?
>>> re.sub(r'key', r'(\g<0>)', 'KEY portkey oKey Keyed', re.I)

Click to view answer

You should always pass flags as a keyword argument. Using it as positional argument leads to a common mistake between re.findall() and re.sub() functions due to difference in their placement.

re.findall(pattern, string, flags=0)

re.sub(pattern, repl, string, count=0, flags=0)

>>> +re.I
2

# works because flags is the only optional argument for findall
>>> re.findall(r'key', 'KEY portkey oKey Keyed', re.I)
['KEY', 'key', 'Key', 'Key']

# wrong usage, but no error because re.I has a value of 2
# so, this is same as specifying count=2
>>> re.sub(r'key', r'(\g<0>)', 'KEY portkey oKey Keyed', re.I)
'KEY port(key) oKey Keyed'

# correct use of keyword argument
>>> re.sub(r'key', r'(\g<0>)', 'KEY portkey oKey Keyed', flags=re.I)
'(KEY) port(key) o(Key) (Key)ed'
# alternatively, you can use inline flags to avoid this problem altogether
>>> re.sub(r'(?i)key', r'(\g<0>)', 'KEY portkey oKey Keyed')
'(KEY) port(key) o(Key) (Key)ed'

re vs regex module flags🔗

The third-party regex module is handy for advanced features like subexpression calls, skipping matches and so on. Can you use re module flag constants with the regex module?

Click to view answer

When using the flags argument with the regex module, the constants should also be used from the regex module.

>>> +re.A
256

>>> +regex.A
128

Again, you can use inline flags to avoid such issues.

Understanding Python re(gex)? book🔗

Visit my GitHub repo Understanding Python re(gex)? for details about the book I wrote on Python regular expressions. The ebook uses plenty of examples to explain the concepts from the very beginning and step by step introduces more advanced concepts. The book also covers the third-party module regex.

Vim tip 20: character based motions within the current line

2023-01-10T00:00:00+00:00

These commands allow you to move based on a single character search, within the current line only.

f( move forward to the next occurrence of character (
fb move forward to the next occurrence of character b
3f" move forward to the third occurrence of character "
t; move forward to the character just before ;
3tx move forward to the character just before the third occurrence of character x
Fa move backward to the character a
Ta move backward to the character just after a
; repeat previous f or F or t or T motion in the same direction
, repeat previous f or F or t or T motion in the opposite direction
- for example, tc becomes Tc and vice versa

Note that the previously used count prefix wouldn't be repeated with ; or , commands, but you can use a new count prefix. If you pressed a wrong motion command, use the Esc key to abandon the search instead of continuing with the wrongly chosen command.

Video demo:

CLI tip 21: inplace file editing with GNU awk

2023-01-04T00:00:00+00:00

You can use the -i option with GNU awk to load libraries. The inplace library comes by default with the GNU awk installation. Thus, you can use -i inplace to modify the original input itself. Make sure to test that the code is working as intended before using this option.

$ cat table.txt
brown bread mat cake 42
blue cake mug shirt -7
yellow banana window shoes 3.14

# retain only the first and third fields
$ awk -i inplace '{print $1, $3}' table.txt
$ cat table.txt
brown mat
blue mug
yellow window

You can provide a backup extension by setting the inplace::suffix special variable. For example, if the input file is ip.txt and inplace::suffix='.orig' is used, the backup file will be named as ip.txt.orig.

$ cat marks.txt
  Name    Physics  Maths
 Moe  76  82
Raj  56  64

$ awk -i inplace -v inplace::suffix='.bkp' -v OFS=, '{$1=$1} 1' marks.txt
$ cat marks.txt
Name,Physics,Maths
Moe,76,82
Raj,56,64

# original file is preserved in 'marks.txt.bkp'
$ cat marks.txt.bkp
  Name    Physics  Maths
 Moe  76  82
Raj  56  64

Earlier versions of GNU awk used INPLACE_SUFFIX variable instead of inplace::suffix. Also, you can use inplace::enable variable to dynamically control whether files should be inplaced or not. See gawk manual: Enabling In-Place File Editing for more details.

See this unix.stackexchange thread for details about security implications of using the -i option and workarounds.

Video demo:

See my CLI text processing with GNU awk ebook if you are interested in learning about the GNU awk command in more detail.

2022: year in perspective

2022-12-30T00:00:00+00:00

TL;DR: Published two programming ebooks, wrote several blog posts, recorded plenty of Youtube videos, newsletter prospered, improved Twitter audience, read 100+ novels, and so on. Had an excellent year in terms of ebook sales 😇

Books published🔗

Vim Reference Guide — concise learning resource for beginner to intermediate level Vim users, published in March
Computing from the Command Line — Linux command line tools and Shell Scripting for beginner to intermediate level users, published in November

Workshops🔗

Offline workshops were back on menu this year. I got only one offer though. Surprisingly, it was for Python basics, despite students already having had a course in their first year. It was a nice experience for me, thanks to the enthusiasm shown by the students.

And it was good to see BarCamp Bangalore being organized again. Gave a talk about my ebook publishing experience (I had also written a blog post on this topic last year).

Blog posts🔗

Here are my favorite posts I wrote this year:

I continued posting weekly programming tips (Python, Linux, Vim) that are short and easy to digest and wrote some mini blog posts as well.

Tools🔗

During the last two months of the year, I learned a bit of Textual and wrote a couple of TUI apps:

Square Tic Tac Toe — form a square with 4 corners
Linux CLI Text Processing Exercises — test your CLI text processing skills

I found the framework much easier to use compared to my experience with Tkinter.

Youtube🔗

While working on the Vim Reference Guide, I felt that some of the commands really needed video demos for easier understanding. So, I gave myself another chance at recording videos. I kept them simple and short, and with consistent practice I did better than my attempts a few years back. I then extended my new found enthusiasm to programming tips, ebook promo videos, etc. Visit my youtube channel for interesting tech nuggets.

Here are some of the tools I use:

SimpleScreenRecorder — recording video, really simple to use
auto-editor — removing silent portions from video recordings
FFmpeg — video processing, padding for example (FFmpeg is also a major part of the auto-editor solution)
Canva — video thumbnails (I also use this app for ebook covers)

Book sales🔗

Revenue from ebook sales were almost 50% higher than last year!! As I wrote in the 2021 was a wild ride post, I've been paying more attention to marketing and seems like my efforts are paying off. Here's my Gumroad revenue chart for 2022:

The peaks were during the two ebook releases. Additionally, I had less than half the above revenue from Leanpub. Last year, sales from Leanpub and Gumroad were nearly the same.

Last November, I started learnbyexample weekly newsletter. I've managed to send an email every Friday without fail so far and I'm proud of that. Sometimes I had to schedule issues a week ahead. Currently about 600 subscribers and some readers are even paying me monthly despite being a free newsletter.

Building Twitter audience🔗

As part of marketing efforts last year, I started building my Twitter audience as well. Follower count was less than 400 in July and about 1100 in December last year. Now it has crossed 2900. I'm not focused on increasing follower count with plethora of engagement inducing tweets. Just trying to be consistent and promoting all sorts of interesting links I come across. That said, I'd like to try creating cool infographics (probably using Canva) next year.

Follow me on Twitter for interesting tech nuggets 😉

Fictional reading🔗

I enjoy reading fantasy and science-fiction novels. I read 100+ SFF books this year and recently wrote a post listing my favorites.

I also got a chance to beta read Tongue Eater, Soul Relic, The Umbral Storm and The Book of Zog. I find these a good way to give back to the writing community, having myself received plenty of support from strangers.

Goals for 2023🔗

I met most of my goals this year, so that's a nice feeling. Contributing to open source projects needs a lot more focus in the coming year. I'm not likely to publish a new ebook in 2023. Instead, I'm planning to update my existing books and that will probably take more than a year. Apart from catching up to new features and improving existing examples/exercises, I'll also focus on changing book titles and cover images. And, I'll likely create interactive apps for exercises.

I need to also find something other than books to keep me creatively busy. It has been more than 4 years since I first published an ebook and 6 years since I started writing programming tutorials.

Here's wishing you a very happy, healthy and prosperous 2023 👍 😇

Python tip 21: sorting iterables based on a key

2022-12-28T00:00:00+00:00

You can use the sort() method for sorting lists inplace. The sorted() function can be used to get a sorted list from any iterable.

The key argument accepts the name of a function (i.e. function object) for custom sorting. If two elements are deemed equal based on the result of the function, the original order will be maintained (stable sorting). Here are some examples:

# based on the absolute value of an element
# note that the input order is maintained for all three values of "4"
>>> nums = [-1, -4, 309, 4.0, 34, 0.2, 4]
>>> nums.sort(key=abs)
>>> nums
[0.2, -1, -4, 4.0, 4, 34, 309]

# based on the length of an element
>>> words = ('morello', 'irk', 'fuliginous', 'crusado', 'seam')
>>> sorted(words, key=len, reverse=True)
['fuliginous', 'morello', 'crusado', 'seam', 'irk']

Here are some examples using lambda expressions:

# sorting dictionaries based on values
>>> vehicles = {'bus': 10, 'car': 20, 'jeep': 3, 'cycle': 5}
>>> sorted(vehicles, key=lambda k: vehicles[k])
['jeep', 'cycle', 'bus', 'car']
>>> dict(sorted(vehicles.items(), key=lambda t: t[1]))
{'jeep': 3, 'cycle': 5, 'bus': 10, 'car': 20}

# based on file extension
>>> files = ('report.txt', 'hello.py', 'calc.sh', 'tictactoe.py')
>>> sorted(files, key=lambda f: f.rsplit('.', 1)[-1])
['hello.py', 'tictactoe.py', 'calc.sh', 'report.txt']

Vim tip 19: working with buffers

2022-12-20T00:00:00+00:00

Multiple files can be opened in Vim within the same tab page and/or in different tabs. From :h windows-intro:

A buffer is the in-memory text of a file.

A window is a viewport on a buffer.

A tab page is a collection of windows.

:e refreshes the current buffer (:e is short for :edit)
:e filename open a particular file by its path, in the same window
:e # switch back to the previous buffer, won't work if that buffer is not named
Ctrl+6 switch back to the previous buffer, works even if that buffer is not named
- Ctrl+^ can also be used
:e #1 open the first buffer, and so on
:buffers show all buffers
- :ls or :files can also be used
:bn open the next file in the buffer list (:bn is short for :bnext)
- opens the first buffer if you are on the last buffer
:bp open the previous file in the buffer list (:bp is short for :bprevious)
- opens the last buffer if you are on the first buffer

Use :set hidden if you want to switch to another buffer even if there are unsaved changes in the current buffer. Instead of this setting, you can also use :hide edit filename to hide the current unsaved buffer. You'll still get an error if you try to quit Vim without saving such buffers, unless you use the ! modifier.

See :h 'autowrite' option if you want to automatically save changes when moving to another buffer.

See :h 22.4 and :h buffer-hidden for user and reference manuals on working with buffer list.

Video demo:

CLI tip 20: expand and unexpand

2022-12-14T00:00:00+00:00

These two commands will help you convert tabs to spaces and vice versa. Both these commands support options to customize the width of tab stops and which occurrences should be converted.

The default expansion aligns at multiples of 8 columns (calculated in terms of bytes).

# 'apple' = 5 bytes, \t converts to 3 spaces
# 'banana' = 6 bytes, \t converts to 2 spaces
# 'a' and 'b' = 1 byte, \t converts to 7 spaces
$ printf 'apple\tbanana\tcherry\na\tb\tc\n' | expand
apple   banana  cherry
a       b       c

# 'αλε' = 6 bytes, \t converts to 2 spaces
$ printf 'αλε\tπού\n' | expand
αλε  πού

By default, the unexpand command converts initial blank (space or tab) characters to tabs. The first occurrence of a non-blank character will stop the conversion. By default, every 8 columns worth of blanks is converted to a tab.

# input is 8 spaces followed by 'a' and then more characters
# the initial 8 spaces is converted to a tab character
# 'a' stops any further conversion, since it is a non-blank character
$ printf '        a       b       c\n' | unexpand | cat -T
^Ia       b       c

# input is 9 spaces followed by 'a' and then more characters
# the initial 8 spaces is converted to a tab character
# remaining space is left as is
$ printf '         a       b       c\n' | unexpand | cat -T
^I a       b       c

Video demo:

See expand and unexpand chapter my Command line text processing with GNU Coreutils ebook for more examples, options, etc.

Interactive Linux CLI Text Processing Exercises

2022-12-09T00:00:00+00:00

Having an interactive program that automatically loads questions and checks the solution is wonderful to have while learning a topic. This TUI app has 60+ beginner to intermediate level exercises for Linux CLI text processing tools.

Installation🔗

Last month, I started learning a Python TUI framework called Textual. After working on a 4x4 board game, I made an interactive app to help you test your CLI text processing skills with 60+ beginner to intermediate level exercises.

You'll need Python for this. This app is available on PyPI as cliexercises. Example installation instructions are shown below, adjust them based on your preferences and OS.

# virtual environment
$ python3 -m venv textual_apps
$ cd textual_apps
$ source bin/activate
$ pip install cliexercises

# launch the app
$ cliexercises

To run the app without having to enter the virtual environment again, add this alias to .bashrc (or equivalent):

# you'll have to change the path
alias cliexercises='/path/to/textual_apps/bin/cliexercises'

As an alternative to manually managing such virtual environments, you can use https://github.com/pypa/pipx instead:

$ pipx install cliexercises
$ cliexercises

As yet another alternative, you can install textual==0.85.2 (see Textual documentation for more details), clone my TUI-apps repository and run the cli_exercises.py file.

Adjust the terminal dimensions for the widgets to appear properly, for example 84x25 (characters x lines).

Video demo🔗

Brief Guide🔗

Press Ctrl+p and Ctrl+n to navigate the questions list.
Type the command in the box below the question.
Press Enter to execute the command.
- Output would be displayed below the command box.
- If the output matches the expected results, the command box will turn green and a reference solution will also be shown.
- Issues due to errors and timeout (about 2 seconds) will be displayed in red.
Press Ctrl+s to toggle the reference solution box.
Press Ctrl+t to toggle between light and dark themes.
Press Ctrl+q to quit the app.
Some basic readline-like shortcuts are supported, for example Ctrl+u, Ctrl+k, Ctrl+w, etc

Your progress is automatically saved and restored. Already answered questions will be skipped.

There is no safeguard against the command you are executing. They are treated as if you typed them from a shell session.

For more detailed instructions, visit https://github.com/learnbyexample/TUI-apps/tree/main/CLI-Exercises

Ebook🔗

The exercises in this app have been adapted from my Command Line ebooks.

Feedback🔗

I'd highly appreciate your feedback. Please file an issue if there are bugs, crashes, etc.

Hope you find this TUI app useful. Happy learning :)

Python tip 20: saving and loading json

2022-12-07T00:00:00+00:00

JSON (JavaScript Object Notation) is one of the ways you can store and retrieve data necessary for functioning of an application. For example, my projects Python regex exercises and Linux CLI text processing exercises need to load questions and save user progress. You might wonder why not just a plain text file? I needed dict in the code anyway and JSON offered seamless transition. Also, this arrangement avoided having to write extra code and test it for potential parsing issues.

The json builtin module is handy for such purposes. Here's an example of saving a dict object:

>>> import json
>>> marks = {'Rahul': 86, 'Ravi': 92, 'Rohit': 75, 'Rajan': 79}
>>> with open('marks.json', 'w') as f:
...     json.dump(marks, f, indent=4)
...

In the above example, indent is used for pretty printing. Here's how the file looks like:

$ cat marks.json
{
    "Rahul": 86,
    "Ravi": 92,
    "Rohit": 75,
    "Rajan": 79
}

And here's an example of loading a JSON file:

>>> with open('marks.json') as f:
...     marks = json.load(f)
... 
>>> marks
{'Rahul': 86, 'Ravi': 92, 'Rohit': 75, 'Rajan': 79}

See docs.python: json for documentation, more examples, other methods, caveats and so on.

Video demo:

See also my 100 Page Python Intro ebook.

Vim tip 18: moving within long lines

2022-11-29T00:00:00+00:00

Here are Normal mode commands you can use to move within long lines that are spread over multiple screen lines:

g0 move to the beginning of the current screen line
g^ move to the first non-blank character of the current screen line
g$ move to the end of the current screen line
gj move down by one screen line, prefix a count to move down by that many screen lines
gk move up by one screen line, prefix a count to move up by that many screen lines
gm move to the middle of the current screen line
- Note that this is based on the screen width, not the number of characters in the line!
gM move to the middle of the current line
- Note that this is based on the total number of characters in the line

See :h left-right-motions for more details.

Video demo:

CLI tip 19: extended globs

2022-11-23T00:00:00+00:00

The Bash shell provides extglob option for advanced pattern matching of filenames. These will help you apply regexp like quantifiers, provide alternate patterns and negation. From man bash:

Extended glob	Description
`?(pattern-list)`	Matches zero or one occurrence of the given patterns
`*(pattern-list)`	Matches zero or more occurrences of the given patterns
`+(pattern-list)`	Matches one or more occurrences of the given patterns
`@(pattern-list)`	Matches one of the given patterns
`!(pattern-list)`	Matches anything except one of the given patterns

Extended globs are disabled by default. You can use shopt -s extglob and shopt -u extglob to set and unset this option respectively.

Here are some examples (visit globs.sh to get the script used below).

$ source globs.sh
$ ls
100.sh   f1.txt      f4.txt    hi.sh   math.h         report-02.log
42.txt   f2_old.txt  f7.txt    ip.txt  notes.txt      report-04.log
calc.py  f2.txt      hello.py  main.c  report-00.log  report-98.log

# one or more digits followed by '.' and then zero or more characters
$ ls +([0-9]).*
100.sh  42.txt

# same as: ls *.c *.sh
$ ls *.@(c|sh)
100.sh  hi.sh  main.c

# not ending with '.txt'
$ ls !(*.txt)
100.sh   hello.py  main.c  report-00.log  report-04.log
calc.py  hi.sh     math.h  report-02.log  report-98.log

# not ending with '.txt' or '.log'
$ ls *.!(txt|log)
100.sh  calc.py  hello.py  hi.sh  main.c  math.h

Video demo:

See also my Linux Command Line Computing ebook.

Festive deals for books on Python, Linux, JavaScript, Regular Expressions and more

2022-11-22T00:00:00+00:00

Hello!

Here are some exciting deals for my programming ebooks as well as from other creators.

My ebooks🔗

Offers valid till 30-Nov-2022:

All 13 Books Bundle — $10 (normal price $28)
Practice Python Projects — FREE (normal price $10)
JavaScript RegExp — FREE (normal price $10)
Learn by example Python bundle — $3 (normal price $15)

Indie creators🔗

Python books by Michael Driscoll — $10 off for all books, 20% off for Teach Me Python
Python Morsels — save up to $108 a year on Python Morsels, until Nov 28 (skill-honing system that helps developers deepen their Python skills)
- see also author's blog post for comprehensive links to other Python deals
Boost Your Django DX and Speed Up Your Django Tests — 50% off (plus further 50% off based on GDP) until Nov 28
- see also author's blog post for comprehensive links to other Django-related deals
Python, Git, and Pandas courses — 40% off
Practical Guide to Technical Blogging is 34% off and Python To Projects bootcamp is 20% off
Complete Guide to CSS Flex and Grid — 60% off on all versions of the eBook (4 days starting from Nov 24)
Explain Ideas Visually — 50% OFF

Miscellaneous🔗

NoStarch Press — Holiday Gift Guide, 35% off until Nov 28
The Pragmatic Bookshelf — 40% off on all ebooks and audio books
Manning Publications — save 50% when you buy 2 or more eBooks, liveProjects, or liveVideos
Leanpub Monthly Sale — offers for programming books, bundles and courses
Real Python Giveaway — a chance to win one of three prizes, until Nov 25
InfoSec Hack Friday — InfoSec related software/tools
The Cyber Plumber's Lab Guide and Interactive Access — 50% OFF
Huge list of awesome deals — tools, productivity, books, courses, etc

Happy learning :)

Python tip 19: manipulating string case

2022-11-16T00:00:00+00:00

Here are five string methods you can use for changing the case of characters. Word level transformation is determined by consecutive occurrences of alphabets, not limited to separation by whitespace characters.

>>> sentence = 'thIs iS a saMple StrIng'

>>> sentence.capitalize()
'This is a sample string'

>>> sentence.title()
'This Is A Sample String'

>>> sentence.lower()
'this is a sample string'

>>> sentence.upper()
'THIS IS A SAMPLE STRING'

>>> sentence.swapcase()
'THiS Is A SAmPLE sTRiNG'

The string.capwords() method is similar to title() but also allows a specific separator (default is whitespace).

>>> import string
>>> phrase = 'this-IS-a:colon:separated,PHRASE'

# every word is transformed
>>> phrase.title()
'This-Is-A:Colon:Separated,Phrase'

# colon character is used as the text boundary
>>> string.capwords(phrase, ':')
'This-is-a:Colon:Separated,phrase'

Video demo:

See also my 100 Page Python Intro ebook.

Building TUIs with textual: first impressions

2022-11-15T00:00:00+00:00

Last week, I finally started exploring textual. The main motivation was to start implementing a few project ideas I've had in my todo list for years. I don't particularly have a preference between TUI (terminal user interface) and GUI (graphical user interface) for these projects. Seeing a few Textual demos on twitter (courtesy Will McGugan) over the past few months, I felt like exploring this framework first.

For my first app, I picked a 4x4 board game — like Tic Tac Toe but form a square instead of a line. I came up with this variation in high school and been fond of coding it since college days.

Installation and Tutorials🔗

The Getting started page of the documentation will give you all the relevant installation instructions. I used pip install 'textual[dev]' since the development mode has nice features like live editing. As I looked up the Devtools page to link here in this blog post, I found that there's a console command for print() based debugging! That would've been handy while I was working on the game — sigh, I should've been more proactive in exploring the documentation site.

In the Getting started page, you'll also be informed about python -m textual (builtin demo) and other examples in the GitHub repo.

After playing with the demo a bit, I went through the tutorial — shows how to build a Stopwatch app step-by-step.

The documentation also includes Guide, Reference, API, etc. I gave them a cursory glance and decided to start building my game.

I should note that while I got introduced to programming in school about 20 years ago, I don't have much experience with projects that need more than a few hundred lines. I'm good with command-line tools and text processing with scripting languages like Python. I had a horrible experience writing an Android app a few years back, mainly due to object-oriented programming and the complexity of the project. I've improved a bit since then, but still feel like a newbie when it comes to working with classes.

Building Square Tic Tac Toe board game🔗

Similar to the step-by-step Textual tutorial, I built the game by adding features incrementally. I tweeted my progress along with screenshots and recordings. Here's a summary:

Managed to place 16 buttons in a grid layout
Buttons now respond to clicking! And in response, the computer plays a random move
Recording below shows 3 games: User wins, AI wins, Tie
Added Easy/Hard modes — it is impossible to beat the AI in hard mode
Almost done! Layout is better now and starting new game is now a button instead of a shortcut keybinding
Cleaned up code a bit and posted on GitHub
Next step: write a blog post (this post!)

Visit my GitHub repo for the code, game rules and other details.

I had made a GUI version of this game using tkinter last year. I copied most of the game logic from there, so I didn't much struggle with object-oriented programming in this case. Here's a sample screenshot from the finished code:

What I liked🔗

As mentioned before, Textual supports live editing mode. The command is textual run --dev script.py and this helps you experiment with CSS. I found this very helpful while trying out layout combinations, margin, padding, etc.

The default colors were great too. I didn't have to think about choosing colors (except for setting background color for header and game status). The framework even provides an easy way to allow users to switch between dark and light themes! Though, I haven't yet figured out how to set light theme as the default (I worked around by explicitly adding a call to the theme toggle method).

Overall, the code was significantly shorter compared to the tkinter version I did last year. That version had a few more features, but I'd say Textual felt much easier to reason about. I remember having to spend days shifting through stackoverflow threads and tkdocs to get the GUI version working.

What gave me trouble🔗

Struggling with layout isn't new for me. I started with 4x4 grid for the board, which was fairly straightforward. Problems arose when I wanted to add status text area to the left and control buttons to the right. Placing them left/right was easy to do with dock in CSS. But, I couldn't get them to align well — too much spacing around the 4x4 board. I was trying to give 50% to 60% for the board and the remaining evenly divided for the other two elements. After some experimentation, what worked was giving 20% to status, 25% to control and not assigning a width value for the board.

I initially used a button for the status because I couldn't find a textbox widget (edit: Textual now has a Label widget). I knew that Static widget can display text, but I didn't find how to dynamically change the text from that documentation page. I thought I'll have to make a custom widget, but when I went to Widgets guide, I found that Static already has an update() method!

I probably missed something (or perhaps part of the roadmap), but I found it strange to have a single on_button_pressed() method to handle on click event for every Button widget. I'd prefer a way to bind a method to the buttons, like tkinter provides.

Next steps🔗

As mentioned before, I have several projects in my todo list. The next one I want to try is an app for interactive exercises for my ebooks. Last year, I made one for Python regular expressions using tkinter.

Computing from the Command Line: sales report

2022-11-14T00:00:00+00:00

I've previously written about events and strategies that led to increased ebook sales during the last quarter of 2021.

Very pleased to inform that I continue to see more than expected sales during release week. My 13th ebook Computing from the Command Line was published on November 1st. Here's how the sales looked on Gumroad during the first ten days:

I used to offer my ebooks for free on release. For the past few releases, I have also added heavily discounted ebook bundles which seems to be the major factor in increased paid sales I'm seeing. Luck certainly plays a role too in reaching users through social media. Here are some of the ways I promoted my latest ebook:

Announcement post on Gumroad and sending an email to existing readers (1000+ users opened the email as per Gumroad analytics)
Pinned tweet — more than 300 link clicks as per Twitter analytics
Posting on /r/commandline/, /r/linux/, /r/linux4noobs/ and /r/FreeEBOOKS/
Show HN post on Hacker News — wasn't lucky this time to reach front page
Promo video on youtube
Mentioned in my learnbyexample weekly newsletter
And of course, I wrote a release post on this blog and also mentioned it on my GitHub Readme

Apart from Gumroad, 400+ readers downloaded the ebook from Leanpub and I got a few paid sales as well. I wrote about pros and cons of Gumroad/Leanpub here.

PS: Make sure to read the rules and be a regular user before self-promoting your content on the social media platforms mentioned above.

Vim tip 17: setting options

2022-11-08T00:00:00+00:00

From :h options.txt:

Vim has a number of internal variables and switches which can be set to achieve special effects. These options come in three forms:

boolean can only be on or off

number has a numeric value

string has a string value

Here are examples for each of these forms:

:set cursorline highlight the line containing the cursor
:set history=200 increase default history from 50 to 200
:set ww+=[,] allow left and right arrow keys to move across lines in Insert mode
- += allows you to append to an existing string value

Usage guidelines:

set {option} switch on the given boolean setting
- :set expandtab use spaces for tab expansion
set {option}! toggle the given boolean setting
- :set expandtab! if previously tabs were expanded, it will be turned off and vice versa
- set inv{option} can also be used
set no{option} switch off the given boolean setting
- :set noexpandtab disable expanding tab to spaces
set {option}? get the current value of the given option (works for all three forms)
- :set expandtab? output will be expandtab or noexpandtab depending on whether it is switched on or off
set {option} get the current value of number or string option
- for example, try :set history or :set ww

See :h options.txt for complete list of usage guidelines and available options.

Video demo:

CLI tip 18: inserting file contents using GNU sed

2022-11-02T00:00:00+00:00

The r command accepts a filename as argument and when the address is satisfied, entire contents of the given file is added after the matching line. This is a robust way to add multiline text literally.

$ cat ip.txt
    * sky
    * apple
$ cat fav_colors.txt
deep red
yellow
reddish
brown

# space between r and filename is optional
# adds entire contents of 'ip.txt' after each line containing 'red'
$ sed '/red/r ip.txt' fav_colors.txt
deep red
    * sky
    * apple
yellow
reddish
    * sky
    * apple
brown

The e flag is the easiest way to insert file contents before the matching lines. Similar to the r command, the output of an external command (cat in the below example) is inserted literally.

$ sed '/red/e cat ip.txt' fav_colors.txt
    * sky
    * apple
deep red
yellow
    * sky
    * apple
reddish
brown

See Adding content from file chapter from my GNU sed ebook for many more examples, gotchas, details about the R command and so on.

Video demo:

See also my CLI text processing with GNU sed ebook.

Python tip 18: arbitrary number of arguments

2022-10-26T00:00:00+00:00

The print() function can accept zero or more values separated by a comma. Here's how the function arguments are shown in help(print):

print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)

Here are some examples with varying number of arguments passed to the print() function:

>>> print()

>>> print('hello')
hello
>>> print(42, 22/7, -100)
42 3.142857142857143 -100

You can write your own functions to accept arbitrary number of arguments as well. The packing syntax is similar to sequence unpacking. A * prefix to an argument name will allow it to accept zero or more values. Such an argument will be packed as a tuple data type and it should always be specified after positional arguments (if any). args is often used as the variable name for this purpose. Here's an example:

>>> def many(x, *args):
...     print(f'{x = }; {args = }')
... 
>>> many()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: many() missing 1 required positional argument: 'x'
>>> many(1)
x = 1; args = ()
>>> many(1, 'two', 3)
x = 1; args = ('two', 3)

Here's a more practical example:

>>> def sum_nums(*args):
...     total = 0
...     for n in args:
...         total += n
...     return total
... 
>>> sum_nums()
0
>>> sum_nums(3, -8)
-5
>>> sum_nums(1, 2, 3, 4, 5)
15
>>> sum_nums(*range(1, 6))
15

Use ** prefix to accept arbitrary number of keyword arguments. See also docs.python: Arbitrary Argument Lists.

Video demo:

See also my 100 Page Python Intro ebook.

Vim tip 16: terminal mode

2022-10-18T00:00:00+00:00

Terminal mode is one way to use shell commands from within Vim.

:terminal open a new terminal window as a horizontal split
- opens above the current window unless splitbelow option is set
:vertical :terminal open a new terminal window as a vertical split
- opens to the left of the current window unless splitright option is set

Here are some shortcuts to navigate between windows and change modes:

Ctrl+w followed by w or Ctrl+w move to the next window
- helps you to easily switch back and forth if you have one text editing window and one terminal window
- see the Splitting tip for more such commands
Ctrl+w followed by N goes to Terminal-Normal mode which will help you to move around using Normal mode commands, copy text, etc (note that you need to use uppercase N here)
- Ctrl+\ followed by Ctrl+n another way to go to Terminal-Normal mode
- :tnoremap <Esc> <C-w>N map Esc key to go to Terminal-Normal mode
Ctrl+w followed by : go to Command-line mode from terminal window

Depending on your shell, you can use the exit command to end the terminal session. Ctrl+d might work too.

There are lot of features in this mode, see :h terminal.txt for more details.

Video demo:

CLI tip 17: common and unique lines

2022-10-12T00:00:00+00:00

Consider these sample input files that are already sorted and the default output from comm:

$ paste colors_1.txt colors_2.txt
Blue    Black
Brown   Blue
Orange  Green
Purple  Orange
Red     Pink
Teal    Red
White   White

$ comm colors_1.txt colors_2.txt
        Black
                Blue
Brown
        Green
                Orange
        Pink
Purple
                Red
Teal
                White

The following comm options will help you construct solutions to get common and unique lines:

-1 suppress lines unique to the first file
-2 suppress lines unique to the second file
-3 suppress lines common to both the files

# common lines
$ comm -12 colors_1.txt colors_2.txt
Blue
Orange
Red
White

# lines unique to colors_2.txt
$ comm -13 colors_1.txt colors_2.txt
Black
Green
Pink

If the input files are not already sorted, or if you want to preserve the order of input lines, you can use awk instead:

# common lines
$ awk 'NR==FNR{a[$0]; next} $0 in a' colors_1.txt colors_2.txt
Blue
Orange
Red
White

# lines unique to colors_2.txt
$ awk 'NR==FNR{a[$0]; next} !($0 in a)' colors_1.txt colors_2.txt
Black
Green
Pink

You can also use grep -Fxf colors_1.txt colors_2.txt (add -v for unique lines) but this wouldn't scale well for larger input files.

Video demo:

See also my Linux Command Line Computing ebook.

Python tip 17: counting frequency of items

2022-10-06T00:00:00+00:00

One of the ways to count the frequency of items is to make use of the dict.get() method:

>>> vehicles = ['car', 'jeep', 'car', 'bike', 'bus', 'car', 'bike']
>>> hist = {}
>>> for v in vehicles:
...     hist[v] = hist.get(v, 0) + 1
... 
>>> hist
{'car': 3, 'jeep': 1, 'bike': 2, 'bus': 1}

And here's a solution using the built-in collections module:

>>> from collections import Counter

>>> vehicles = ['car', 'jeep', 'car', 'bike', 'bus', 'car', 'bike']
>>> Counter(vehicles)
Counter({'car': 3, 'bike': 2, 'jeep': 1, 'bus': 1})

>>> Counter('abracadabra')
Counter({'a': 5, 'b': 2, 'r': 2, 'c': 1, 'd': 1})

See stackoverflow: using a dictionary to count items and stackoverflow: count frequency of elements for more ways to solve this problem.

Video demo:

See also my 100 Page Python Intro ebook.

Vim tip 15: moving within current line

2022-09-27T00:00:00+00:00

Here are some of the Normal mode commands for moving within the current line:

0 move to the beginning of the current line (i.e. column number 1)
- you can also use the Home key
^ move to the beginning of the first non-blank character of the current line (useful for indented lines)
$ move to the end of the current line
- you can also use the End key
- 3$ move to the end of 2 lines below the current line
g_ move to the last non-blank character of the current line
3| move to the third column character
- | is same as 0 or 1|

Video demo:

CLI tip 16: transpose tables

2022-09-21T00:00:00+00:00

GNU datamash has plenty of nifty features for field based operations. Here's an example of transposing comma delimited data:

$ cat scores.csv 
Name,Maths,Physics,Chemistry
Ith,100,100,100
Cy,97,98,95
Lin,78,83,80
Er,60,70,90

$ datamash -t, transpose <scores.csv 
Name,Ith,Cy,Lin,Er
Maths,100,97,78,60
Physics,100,98,83,70
Chemistry,100,95,80,90

And here's an alternate solution using tr, wc and pr:

# divide input into five parts and join them vertically
$ seq 10 | pr -5ts,
1,3,5,7,9
2,4,6,8,10

# tr converts input table into single field per line
# wc calculates number of rows and pr does the rest
$ tr ',' '\n' <scores.csv | pr -$(wc -l <scores.csv)ts,
Name,Ith,Cy,Lin,Er
Maths,100,97,78,60
Physics,100,98,83,70
Chemistry,100,95,80,90

See also unix.stackexchange: How to process an x-column text file to get a y-column one? for many more ways to deal with such problems. See my blog post for examples and resource links on the GNU datamash command.

Video demo:

See also my Linux Command Line Computing ebook.

Python tip 16: delete list elements using index or slice

2022-09-14T00:00:00+00:00

The pop() method removes the last element of a list by default. You can pass an index to delete that specific item and the list will be automatically re-arranged. Return value is the element being deleted.

>>> primes = [2, 3, 5, 7, 11]
>>> primes.pop()
11
>>> primes
[2, 3, 5, 7]

>>> student = ['learnbyexample', 2022, ['Linux', 'Vim', 'Python']]
>>> student.pop(1)
2022
>>> student[-1].pop(1)
'Vim'
>>> student
['learnbyexample', ['Linux', 'Python']]

To remove multiple elements using slicing notation, use the del statement. Unlike the pop() method, you won't get the elements being deleted as the return value.

>>> books = ['cradle', 'mistborn', 'legends & lattes', 'sourdough']
>>> del books[-1]
>>> books
['cradle', 'mistborn', 'legends & lattes']
>>> del books[:2]
>>> books
['legends & lattes']

>>> student = ['learnbyexample', 2022, ['Linux', 'Vim', 'Python']]
>>> del student[-1][1]
>>> student
['learnbyexample', 2022, ['Linux', 'Python']]

Video demo:

See also my 100 Page Python Intro ebook.

Vim tip 14: horizontal and vertical splits

2022-09-06T00:00:00+00:00

You can have multiple windows within the same tab page.

:split filename open file for editing in a new horizontal window, above the current window
- you can also use :sp instead of :split
- :set splitbelow open horizontal splits below the current window
:vsplit filename open file for editing in a new vertical window, to the left of the current window
- you can also use :vs instead of :vsplit
- :set splitright open vertical splits to the right of the current window

Here are some shortcuts to navigate between windows:

Ctrl+w followed by w switch to the below/right window for horizontal/vertical splits respectively
- Ctrl+w followed by Ctrl+w also performs the same function
- switches to the first split if you are on the last split
Ctrl+w followed by W switch to the above/left window for horizontal/vertical splits respectively
- switches to the last split if you are on the first split
Ctrl+w followed by hjkl or arrow keys, switch in the respective direction
Ctrl+w followed by t or b switch to the top (first) or bottom (last) window
Ctrl+w followed by HJKL (uppercase), moves the current split to the farthest possible location in the respective direction

If filename is not provided, the current one is used.

Vim adds a highlighted horizontal bar containing the filename for each split.

Video demo:

CLI tip 15: text generation with printf and brace expansion

2022-08-31T00:00:00+00:00

You can use brace expansion for generating a sequence of numbers and alphabets. printf helps you to display multiple arguments using the same format specifier. For example:

$ echo {1..3}
1 2 3
$ echo {1..2}{a..b}
1a 1b 2a 2b

$ printf '%s\n' apple banana cherry
apple
banana
cherry

Combining the two, you can generate multiple lines of text. Here are some examples:

$ printf '%s\n' id_{3..1}
id_3
id_2
id_1

$ printf '%s\n' item_{100..120..4}
item_100
item_104
item_108
item_112
item_116
item_120

Here's a practical example:

# the string before %.s is repeated based on the number of arguments
$ printf 'x %.s' a b c
x x x 
$ printf -- '- %.s' {1..5}
- - - - - 

# same as: seq 10 | paste -d, - - - - -
$ seq 10 | paste -d, $(printf -- '- %.s' {1..5})
1,2,3,4,5
6,7,8,9,10

$ n=5
$ seq 10 | paste -d, $(printf -- '- %.s' $(seq $n))
1,2,3,4,5
6,7,8,9,10

$ n=2
$ seq 10 | paste -d, $(printf -- '- %.s' $(seq $n))
1,2
3,4
5,6
7,8
9,10

See this stackoverflow thread for other alternatives, avoiding printf for large numbers, etc.

Video demo:

See also my Linux Command Line Computing ebook.

Python tip 15: string transliteration

2022-08-24T00:00:00+00:00

The str.translate() method accepts a table of codepoints (numerical value of a character) mapped to another character or codepoint. Map to None for characters that have to be deleted. You can use the ord() built-in function to get the codepoint of characters. Or, you can use the str.maketrans() method to generate the mapping for you.

>>> ord('a')
97
>>> ord('A')
65

>>> greeting = 'have a nice day'
# map 'a' to 'A', 'e' to 'E' and 'i' to None
>>> greeting.translate({97: 65, 101: 'E', 105: None})
'hAvE A ncE dAy'

# first and second arguments specify the one-to-one mapping of characters
# third argument is optional, specifies characters to be deleted
>>> str.maketrans('ae', 'AE', 'i')
{97: 65, 101: 69, 105: None}
>>> greeting.translate(str.maketrans('ae', 'AE', 'i'))
'hAvE A ncE dAy'

The string module has a collection of constants that are often useful in text processing. Here's an example of deleting punctuation characters:

>>> from string import punctuation
>>> punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

>>> para = '"Hi", there! How *are* you? All fine here.'
>>> para.translate(str.maketrans('', '', punctuation))
'Hi there How are you All fine here'

>>> chars_to_delete = ''.join(set(punctuation) - set('.!?'))
>>> para.translate(str.maketrans('', '', chars_to_delete))
'Hi there! How are you? All fine here.'

Video demo:

See also my 100 Page Python Intro ebook.

Vim tip 13: repeat last change

2022-08-16T00:00:00+00:00

It is way too easy to repeat the last change you made:

. the Normal mode dot command repeats the last change
- you can also use a number prefix to override the count of the last change

For example,

if the last change was 2dd (delete current line and the line below), dot key will repeat 2dd
- using 3. will mean 3dd and not 6dd, since the count prefix replaces the earlier number
if the last change was 5x (delete current character and four characters to the right), dot key will repeat 5x
if the last change was C123<Esc> and dot key is pressed, it will clear from the current character to the end of the line, insert 123 and go back to Normal mode

From :h 4.3:

The . command works for all changes you make, except for u (undo), CTRL-R (redo) and commands that start with a colon (:).

See :h repeat.txt for complex repeats, using Vim scripts, etc.

Video demo:

Programming ebooks by Sundeep Agarwal

2022-08-09T00:00:00+00:00

This post lists my programming ebooks with details like PDF/EPUB purchase links, GitHub repos, web versions, testimonials, etc. All my ebooks are self-published. You can get these ebooks individually or as part of bundles. You can also read them online for free.

Bundles 📚

Poster created using Canva

All books bundle: leanpub or gumroad
- 13 programming ebooks on Regular Expressions, Linux CLI tools, Python, Vim and more
Linux CLI Text Processing bundle: leanpub or gumroad
- GNU grep, sed, awk, Perl and Ruby one-liners, GNU coreutils, CLI computing
Awesome regex: leanpub or gumroad
- Python, Ruby, JavaScript Regular expressions
- GNU grep, ripgrep, GNU sed, GNU awk CLI tools (BRE/ERE, PCRE, Rust regex crate, PCRE2)
- Vim regexp
Magical one-liners: leanpub or gumroad
- GNU grep, ripgrep, GNU sed, GNU awk, Ruby, Perl CLI tools
Learn by example Python bundle: leanpub or gumroad
- Python introduction, Regular expressions and Projects
Ruby Text processing: leanpub or gumroad
- Ruby regular expressions, Ruby One-Liners Guide

Testimonials 😍

I love your books on regex...As a student from the Digital VLSI space, it is indeed useful now and definitely in the future. It's really well written and really easy to understand the examples.

— feedback on reddit

It's very thorough, written with care, and presented in a way that makes sense. Even as an intermediate Python programmer, I found use in this book.

— feedback by Andrew Healey on Hacker News for "100 Page Python Intro"

Step up your cli fu with this fabulous intro & deep dive into awk. I learned a ton of tricks!

— feedback on twitter

Your Practice Python Projects book is really helping me to reinforce my knowledge and mastery of Python as I'm learning.

— feedback on twitter

In my opinion the book does a great job of quickly presenting examples of how commands can be used and then paired up to achieve new or interesting ways of manipulating data. Throughout the text there are little highlights offering tips on extra functionality or limitations of certain commands. For instance, when discussing the shuf command we're warned that shuf will not work with multiple files. However, we can merge multiple files together (using the cat command) and then pass them to shuf. These little gems of wisdom add a dimension to the book and will likely save the reader some time wondering why their scripts are not working as expected.

— book review by Jesse Smith on distrowatch.com for "Command line text processing with GNU Coreutils"

Literally was having a mini-breakdown about not understanding Regex in algorithm solutions the other day and now I'm feeling so much better, so thank YOU! I genuinely feel like I'm developing the skill for spotting when and where to use them after so much practice!

— feedback on twitter

This Ruby one-liners cookbook is incredible. Pretty mind boggling all the stuff you can do.

— feedback on twitter

Hi, great work releasing this! Trying to explain vim concisely is always an interesting challenge and I had a great time reading your attempt in this book. I always find it really interesting on how people try to group certain vim functions in a way that makes sense to people that don't use vim. I think you cover that idea pretty well in your 'Vim philosophy and features' section whilst not making it overly abstract and keeping it relatable.

— feedback on Hacker News by doix for "Vim Reference Guide"

I consider myself pretty experienced at shell-fu and capable of doing most things I set out to achieve in either bash scripts or fearless one-liners. However, my awk is rudimentary at best, I think mostly because it's such an unforgiving environment to experiment in. These books you've written are great for a bit of first principles insight and then quickly building up to functional usage. I will have no hesitation in referring colleagues to them!

— feedback on Hacker News

Thank you for choosing to write and share your knowledge. I read your books on CLI and sed - I think they are very comprehensive and very well explained. Keep up the great work

— feedback on twitter

This is fantastic! 👏 I use Perl one-liners for record and text processing a lot and this will be definitely something I will keep coming back to - I’ve already learned a trick from “Context Matching” (9) 🙂

— feedback on [email protected]

Nice book! I just started trying to get into linux today and you have some tips I haven’t found elsewhere and the text is an enjoyable read so far.

— feedback on reddit

I discovered your books recently and they’re awesome, thank you! As a 20 year *nix they made me realize how much more there are to these rock solid and ancient tools, once you spend the time to actually learn the intricacies of them.

— feedback on reddit

I love the whole learn by example premise. Those exercises at the end are so valuable, as it often times leads me to find multiple solutions which helps me conceptualize how commands work with each other much better!

— feedback on reddit

100 Page Python Intro

Short, introductory guide for the Python programming language, suited for those already familiar with programming basics.

Understanding Python re(gex)?

Learn Python Regular Expressions step-by-step from beginner to advanced levels with 300+ examples. Both re and regex modules are covered. Exercises are also included to test your understanding.

Practice Python Projects

Know Python basics but don't know what to do next? Take the next step in your programming journey with real world inspired Python projects.

Understanding JavaScript RegExp

Learn JavaScript Regular Expressions step-by-step from beginner to advanced levels with hundreds of examples and exercises.

CLI text processing with GNU grep and ripgrep

Example based guide to mastering GNU grep and ripgrep. Exercises are also included to test your understanding.

CLI text processing with GNU sed

Example based guide to mastering GNU sed. Exercises are also included to test your understanding.

CLI text processing with GNU awk

Example based guide to mastering GNU awk one-liners. Exercises are also included to test your understanding.

Understanding Ruby Regexp

Learn Ruby Regular Expressions step-by-step from beginner to advanced levels with hundreds of examples and exercises.

Sample chapters
Pay what you want for pdf/epub:
- gumroad
- leanpub
GitHub repo for code snippets and more
web version
Feedback: Twitter

Ruby One-Liners Guide

Example based guide for text processing with Ruby from the command line. Exercises are also included to test your understanding.

Perl One-Liners Guide

Example based guide for text processing with Perl from the command line. Exercises are also included to test your understanding.

CLI text processing with GNU Coreutils

Vim Reference Guide

This is intended as a concise learning resource for beginner to intermediate level Vim users. It has more in common with cheatsheets than a typical text book. Topics like Regular Expressions and Macros have more detailed explanations and examples due to their complexity.

Sample chapters
Buy pdf/epub from:
- gumroad
- leanpub
GitHub repo
web version
Feedback: Twitter

Linux Command Line Computing

Sample chapters
Buy pdf/epub from:
- gumroad
- leanpub
GitHub repo
web version
Feedback: Twitter

CLI tip 14: specify permissions during directory creation

2022-08-09T00:00:00+00:00

You can use mkdir -m instead of creating a directory with mkdir first and then changing the directory permissions with the chmod command. The argument to the -m (mode) option uses the same syntax as the chmod command.

# instead of this
$ mkdir back_up
$ chmod 750 back_up

# do this
$ mkdir -m 750 back_up
$ stat -c '%a %A' back_up
750 drwxr-x---

Here are some more examples:

$ mkdir -m =rx dummy_dir
$ stat -c '%a %A' dummy_dir
555 dr-xr-xr-x

$ mkdir -m go-rwx dot_files
$ stat -c '%a %A' dot_files
700 drwx------

Video demo:

See also my Linux Command Line Computing ebook.

Python tip 14: sequence unpacking

2022-08-03T00:00:00+00:00

You can assign the individual elements of a sequence to multiple variables. This is known as sequence unpacking and it is handy in many situations.

>>> details = ['2018-10-25', 'car', 2346]

>>> purchase_date, vehicle, qty = details
>>> purchase_date
'2018-10-25'
>>> vehicle
'car'
>>> qty
2346

Here's how you can easily assign and swap multiple variables.

# multiple assignments
>>> num1, num2, num3 = 3.14, 42, -100

# swapping values
>>> num1, num2, num3 = num3, num1, num2

>>> print(f'{num1 = }\n{num2 = }\n{num3 = }')
num1 = -100
num2 = 3.14
num3 = 42

Unpacking isn't limited to mapping every element of the sequence. You can use a * prefix to catch all the remaining values (if any is left) in a list variable.

>>> values = ('first', 100, 200, 300, 'last')

>>> x, *y = values
>>> x
'first'
>>> y
[100, 200, 300, 'last']

>>> s1, *nums, s2 = values
>>> s1
'first'
>>> nums
[100, 200, 300]
>>> s2
'last'

See Unpacking with starred assignments for more examples and explanations.

Video demo:

See also my 100 Page Python Intro ebook.

Vim tip 12: save and restore sessions

2022-07-26T00:00:00+00:00

You can save and restore Vim sessions to continue working with the same setup before you had to quit Vim for reasons like switching off the machine, switching to another project, etc.

:mksession proj.vim save the current Vim session with details like cursor position, file list, layout, etc
- you can customize things to be saved using the sessionoptions setting
- for example, :set sessionoptions+=resize will save resized window information as well
:mksession! proj.vim overwrite existing session
:source proj.vim restore Vim session from proj.vim file
- vim -S proj.vim restore a session from the command line when launching Vim

See :h 21.4, :h views-sessions and :h 'sessionoptions' for more details.

See stackoverflow: How to save and restore multiple different sessions in Vim? for custom settings to automate the save and restore process and other tips and tricks. See also Learn-Vim: Views, Sessions, and Viminfo.

Video demo:

CLI tip 13: join lines of two files based on the first field

2022-07-20T00:00:00+00:00

By default, join combines two files based on the first field content (also referred as key). Only the lines with common keys will be part of the output. The key field will be displayed first in the output (this distinction will come into play if the first field isn't the key). Rest of the line will have the remaining fields from the first and second files, in that order. One or more blanks (space or tab) will be considered as the input field separator and a single space will be used as the output field separator. If present, blank characters at the start of the input lines will be ignored.

# sample sorted input files
$ cat jan.txt
apple   10
banana  20
soap    3
tshirt  3
$ cat feb.txt
banana  15
fig     100
pen     2
soap    1

# combine common lines based on the first field
$ join jan.txt feb.txt
banana 20 15
soap 3 1

Here's an awk version to do the same. Helpful if you want to do some additional processing that won't be possible with the join command. Another advantage is that this solution will work even if the input files are not sorted.

$ awk 'NR==FNR{a[$1]=$2; next} $1 in a{print $1, a[$1], $2}' jan.txt feb.txt
banana 20 15
soap 3 1

Video demo:

See join chapter from my Command line text processing with GNU Coreutils ebook for more details and examples.

Python tip 13: formatting numbers with underscore separation

2022-07-13T00:00:00+00:00

For readability purposes, you can use underscores while declaring large numbers. For example:

>>> 1_000_000_000
1000000000

>>> 0b1000_1111
143

Did you know that you can also format numbers with underscore separation?

>>> n = 14310023

# underscore separation
>>> f'{n:_}'
'14_310_023'

# you can also use comma separation for integers
>>> f'{n:,}'
'14,310,023'

Here are some examples for displaying numbers in binary, octal and hexadecimal formats:

>>> n = 14310023

>>> f'{n:_b}'
'1101_1010_0101_1010_1000_0111'
>>> f'{n:#_b}'
'0b1101_1010_0101_1010_1000_0111'

>>> f'{n:#_x}'
'0xda_5a87'

>>> f'{n:#_o}'
'0o6645_5207'

And here's an example with zero filling:

>>> for n in (3, 20, 28):
...     print(f'{n:09_b}')
... 
0000_0011
0001_0100
0001_1100

See docs.python: Formatted string literals for documentation and other examples.

Video demo:

See also my 100 Page Python Intro ebook.

Vim tip 11: replace characters in Normal mode

2022-07-06T00:00:00+00:00

Often, you just need to change one character. For example, changing i to j, 2 to 4, ' to " and so on.

rj replace the character under the cursor with j
ry replace the character under the cursor with y
3ra replace the character under cursor as well as the two characters to the right with aaa
- no changes will be made if there aren't sufficient characters to match

To replace multiple characters with different characters, use R.

Rlion followed by Esc replace the character under cursor and three characters to the right with lion
- Esc key marks the completion of R command
- Backspace key will act as an undo command to give back the character that was replaced
- if you are replacing at the end of a line, the line will be automatically extended if needed

The advantage of r and R commands is that you remain in the Normal mode, without needing to switch to Insert mode and back.

Video demo:

CLI tip 12: squeeze empty lines

2022-06-29T00:00:00+00:00

awk has a builtin feature to process input content paragraph wise (by setting RS to an empty string). But, did you know that cat, less and grep can also be used to squeeze empty lines?

cat -s (and less -s) will squeeze multiple empty lines in the input to a single empty line in the output. Here's an example:

$ cat ip.txt
hello




world

apple
banana
cherry


tea coffee
chocolate
$ cat -s ip.txt
hello

world

apple
banana
cherry

tea coffee
chocolate

Here's an example with empty lines at the start/end of the input:

$ printf '\n\n\ndragon\n\n\nunicorn\n\n\n'



dragon


unicorn


$ printf '\n\n\ndragon\n\n\nunicorn\n\n\n' | cat -s

dragon

unicorn

And here's a solution with awk. Unlike the -s option, this will completely remove empty lines at the start/end of the input.

$ awk -v RS= '{print s $0; s="\n"}' ip.txt
hello

world

apple
banana
cherry

tea coffee
chocolate

$ printf '\n\n\ndragon\n\n\nunicorn\n\n\n' | awk -v RS= '{print s $0; s="\n"}'
dragon

unicorn

The awk solution would be easier to extend, given its programmable features. For example, two empty lines between the groups:

$ awk -v RS= '{print s $0; s="\n\n"}' ip.txt
hello


world


apple
banana
cherry


tea coffee
chocolate

And here's a surprising GNU grep solution, with a customizable group separator:

# single empty line
$ grep --group-separator= -A0 '.' ip.txt
hello

world

apple
banana
cherry

tea coffee
chocolate

# double empty line
# empty lines at the start/end of the input are removed too
$ printf '\n\n\ndragon\n\n\nunicorn\n\n\n' | grep --group-separator=$'\n' -A0 '.'
dragon


unicorn

Video demo:

Python tip 12: negate a regex grouping

2022-06-22T00:00:00+00:00

You might be familiar with negating a character class, for example:

>>> import re

# remove first two columns
>>> re.sub(r'\A([^:]+:){2}', '', 'apple:42:banana:1000:cherry:512')
'banana:1000:cherry:512'

# filter all elements not ending with `r` or `t`
>>> words = ['surrender', 'unicorn', 'newer', 'door', 'empty', 'eel', 'pest']
>>> [w for w in words if re.search(r'[^rt]\Z', w)]
['unicorn', 'empty', 'eel']

But do you know how to match characters based on a negated group? You can use a combination of negative lookahead and quantifiers as shown in the examples below:

>>> pets = 'fox,cat,dog,parrot'

# match if 'do' is not present between 'at' and 'par'
>>> bool(re.search(r'at((?!do).)*par', pets))
False

# match if 'go' is not present between 'at' and 'par'
>>> bool(re.search(r'at((?!go).)*par', pets))
True

# easier to understand by looking at the matched portions
>>> re.search(r'at((?!go).)*par', pets)[0]
'at,dog,par'
>>> re.search(r'\A((?!par).)*', pets)[0]
'fox,cat,dog,'

The . in ((?!go).)* will match a character only if the sequence of current and next characters are not go. Similarly, the . in ((?!par).)* matches a character only if the current and next two characters are not par. The * quantifier is applied on the outer group to match zero or more characters satisfying the given condition.

The outer group in the above examples are capturing groups, though it wasn't required. Just makes the pattern concise. However, capturing groups affect the behavior of functions like re.split and re.findall. You can use non-capturing groups in such cases:

# capture group affects the behavior of 're.findall'
>>> re.findall(r'\b((?!42)\w)+\b', 'a422b good bad42 nice100')
['d', '0']

# so, use a non-capturing group here
>>> re.findall(r'\b(?:(?!42)\w)+\b', 'a422b good bad42 nice100')
['good', 'nice100']

Test your understanding by solving this exercise. Construct a regex solution that works for all three sample transformations shown below:

Power(x,2) should be replaced with (x)*(x)

Power(Power(x,2) + x,2) should be changed to ((x)*(x) + x)*((x)*(x) + x)

Power(x + Power(x,2),2) should be changed to (x + (x)*(x))*(x + (x)*(x))

If that was easy, make it work for general powers instead of just 2:

Power(Power(x,2),3) translates to ((x)*(x))*((x)*(x))*((x)*(x))

The above exercise is based on this stackoverflow Q&A.

Video demo:

See also my Understanding Python re(gex)? ebook.

Vim tip 10: Undo and Redo

2022-06-15T00:00:00+00:00

In Normal mode, you can undo and redo changes using the following commands:

u undo last change
- press u again for further undos
U undo latest changes on last edited line
Ctrl+r redo a change undone by u
U redo changes undone by U

See :h 32.3 for details on g- and g+ commands that you can use to undo branches.

Video demo:

CLI tip 11: longest line length

2022-06-08T00:00:00+00:00

You can use wc -L to report the length of the longest line in the input (excluding the newline character of a line).

$ echo 'apple' | wc -L
5

# last line not ending with newline won't be a problem
$ printf 'apple\nbanana' | wc -L
6

$ cat greeting.txt
hi there
have a nice day
$ wc -L <greeting.txt
15

If multiple files are passed, the last line summary will show the maximum length among the given inputs.

$ wc -L greeting.txt sample.txt para.txt
 15 greeting.txt
 26 sample.txt
 11 para.txt
 26 total

-L won't count non-printable characters and tabs are converted to equivalent spaces. You can use awk if these are not acceptable.

# tab characters can occupy up to 8 columns
$ printf '\t' | wc -L
8
$ printf '(\t)' | wc -L
9
$ printf '(\t)' | awk '{print length()}'
3

# non-printable characters aren't counted
$ printf '(\34)' | wc -L
2
$ printf '(\34)' | awk '{print length()}'
3

Note that the awk command in the above illustration is similar to wc -L only for single line inputs. For multiple lines, you can use the following command:

awk '{len = length(); if(len > max) max = len} END{print max}'

Multibyte characters and grapheme clusters will each be counted as 1, assuming the current locale is set appropriately:

# multibyte characters are counted as 1 each in supported locales
$ printf 'αλεπού' | wc -L
6

# grapheme cluster example
$ printf 'cag̈e' | wc -L
4

# non-supported locales can cause them to be treated as non-printable
$ printf 'αλεπού' | LC_ALL=C wc -L
0

Video demo:

Bash compound commands and redirection

2022-06-04T00:00:00+00:00

I've been using Linux for about 15 years. There are a lot of features I don't know and some that I've used but not often enough or to the full extent of possibilities.

Recently, I had written a bash function, which required saving the output of a for loop to a file. I knew that compound commands support redirection, but it didn't strike me at that time as I haven't had to use them often.

Here's a simplified version of the function I wrote first:

pf()
{
    > input.txt
    for f in "$@" ; do echo "$f $f.bkp" >> input.txt ; done
    cmd input.txt > output.txt
}

Having to empty the file using > input.txt got me thinking that perhaps I was missing some obvious solution. Few days later, I realized that instead of using >> during every iteration of the loop, I should have just applied > to the loop itself.

pf()
{
    for f in "$@" ; do echo "$f $f.bkp" ; done > input.txt
    cmd input.txt > output.txt
}

echo and cmd in the above examples are just placeholders for illustration purposes. I needed both input.txt and output.txt after calling the function, which is why I didn't use | or process substitution.

Python tip 11: capture external command output

2022-06-01T00:00:00+00:00

The subprocess module provides plethora of features to execute external commands, capturing output being one of them. There are two ways to do so:

passing capture_output=True to subprocess.run()
subprocess.check_output() if you only want stdout

By default, results are provided as bytes data type. You can change that by passing text=True.

>>> import subprocess
>>> cmd = ('date', '-u', '+%A')

>>> p = subprocess.run(cmd, capture_output=True, text=True)
>>> p
CompletedProcess(args=('date', '-u', '+%A'), returncode=0,
                 stdout='Wednesday\n', stderr='')
>>> p.stdout
'Wednesday\n'

>>> subprocess.check_output(cmd, text=True)
'Wednesday\n'

With check_output(), you'll get an exception if something goes wrong with the command being executed. With run(), you'll get that information from stderr and returncode as part of the CompletedProcess object.

>>> cmd = ('ls', 'xyz.txt')

>>> subprocess.run(cmd, capture_output=True, text=True)
CompletedProcess(args=('ls', 'xyz.txt'), returncode=2, stdout='',
         stderr="ls: cannot access 'xyz.txt': No such file or directory\n")

>>> subprocess.check_output(cmd, text=True)
ls: cannot access 'xyz.txt': No such file or directory
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.8/subprocess.py", line 415, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '('ls', 'xyz.txt')' returned
                               non-zero exit status 2.

You can also use legacy methods subprocess.getstatusoutput() and subprocess.getoutput() but they lack in features and do not provide secure options. See docs.python: subprocess Legacy Shell Invocation Functions for details.

Video demo:

See also my 100 Page Python Intro ebook.

Vim tip 9: named registers

2022-05-24T00:00:00+00:00

In Normal mode, you can use lowercase alphabets a-z to save some content for future use. You can also append some more content to those registers by using the corresponding uppercase alphabets A-Z at a later stage.

"ayy copy the current line to the "a register
"bdip delete the current paragraph, contents will also be saved to the "b register
"Ayj append the current line and the line below to the "a register
- "ayy followed by "Ayj will result in total three lines in the "a register
"ap paste content from the "a register
"eyiw copy word under the cursor to the "e register

You can use :reg (short for :registers) to view the contents of the registers. Specifying one or more characters (next to each other as a single string) will display contents only for those registers.

The named registers are also used for saving macros. You can record an empty macro to clear the contents, for example qbq clears the "b register.

Video demo:

Debug woes 3: matching uppercase letters

2022-05-13T00:00:00+00:00

So, I was going through GNU bash manual: Shell Parameter Expansion and trying out examples to check if I was understanding the features well.

When it came to case conversion, it was a bit confusing to know that you can only use a single character length glob. Here are the examples for lowercase to uppercase conversion that I used:

$ fruit='apple'

# all characters to uppercase
$ echo "${fruit^^}"
APPLE

# convert any character that matches [g-z] to uppercase
$ echo "${fruit^^[g-z]}"
aPPLe

# this won't work since 'sky-' is not a single character
$ c='sky-rose'
$ echo "${c^^*-}"
sky-rose

To convert uppercase to lowercase, you just need to use , instead of ^. Sounds simple right? It really is. But, I got stuck while trying to modify the above examples:

$ fruit='APPLE'

# worked as expected
$ echo "${fruit,,}"
apple

# expected 'ApplE' but got 'APPLE'
# can you spot the mistake?
$ echo "${fruit,,[g-z]}"
APPLE

I usually go through documentation and stackexchange sites when I'm stuck. After going through some threads, I came across this unix.stackexchange example:

$ str="HELLO"
$ printf '%s\n' "${str,,[HEO]}"
heLLo

Okay, I thought, this seems similar to what I wanted. Need to check out if this works on my machine. Before I even finished typing the example, my brain's light bulb turned on. I should have used G-Z instead of lowercase range.

$ echo "${fruit,,[G-Z]}"
ApplE

CLI tip 10: version sort

2022-05-13T00:00:00+00:00

You can use sort -V for sorting numerical input that is mixed with other characters. It also helps when you want to treat digits after a decimal point as whole numbers, for example if 1.10 should be greater than 1.2.

$ printf '1.5\n1.10\n1.2' | sort -n
1.10
1.2
1.5
$ printf '1.5\n1.10\n1.2' | sort -V
1.2
1.5
1.10

$ cat versions.txt
file2
cmd5.2
file10
cmd1.6
file5
cmd5.10
$ sort -V versions.txt
cmd1.6
cmd5.2
cmd5.10
file2
file5
file10

Here's an example of dealing with numbers reported by the time command (assuming all the entries have the same format).

$ cat timings.txt
5m35.363s
3m20.058s
4m11.130s
3m42.833s
4m3.083s

$ sort -V timings.txt
3m20.058s
3m42.833s
4m3.083s
4m11.130s
5m35.363s

See GNU coreutils manual: Version sort ordering for more details. Also, note that the ls command uses lowercase -v for this task.

Video demo:

See sort command chapter from my Command line text processing with GNU Coreutils ebook for more details.

Python tip 10: removeprefix and removesuffix string methods

2022-05-11T00:00:00+00:00

Python supports plenty of string methods that reduces the need for regular expressions. The removeprefix() and removesuffix() string methods were added in the Python 3.9 version. See PEP 616 for more details.

These methods help to delete an exact substring from the start and end of the input string respectively. Here are some examples:

# remove 'sp' if it matches at the start of the input string
>>> 'spare'.removeprefix('sp')
'are'
# 'par' is present in the input, but not at the start
>>> 'spare'.removeprefix('par')
'spare'

# remove 'me' if it matches at the end of the input string
# only one occurrence of the match will be removed
>>> 'this meme'.removesuffix('me')
'this me'
# characters have to be matched exactly in the same order
>>> 'this meme'.removesuffix('em')
'this meme'

These remove methods will delete the given substring only once from the start or end of the string. On the other hand, the strip methods treat the argument as a set of characters to be matched any number of times in any order until a non-matching character is found. Here are some examples:

>>> 'these memes'.removesuffix('esm')
'these memes'
>>> 'these memes'.rstrip('esm')
'these '

>>> 'effective'.removeprefix('ef')
'fective'
>>> 'effective'.lstrip('ef')
'ctive'

Video demo:

See also my 100 Page Python Intro ebook.

Python 3.11: possessive quantifiers and atomic grouping added to re module

2022-05-07T00:00:00+00:00

Quoting from What's New In Python 3.11:

Atomic grouping ((?>...)) and possessive quantifiers (*+, ++, ?+, {m,n}+) are now supported in regular expressions. (Contributed by Jeffrey C. Jacobs and Serhiy Storchaka in bpo-433030.)

Poster created using Canva

If you are not familiar with regular expressions, see my Understanding Python re(gex)? ebook to get started.

Backtracking🔗

Greedy quantifiers match as much as possible, provided the overall regex is satisfied. For example, :.* will match : followed by rest of the input line. However, if you change the pattern to :.*apple, the .* portion cannot simply consume the rest of the input line. The regex engine will have to find the largest portion such that apple is also part of the match (provided the input has such a string, of course).

>>> import re

>>> ip = 'fig:mango:pineapple:guava:apples:orange'

>>> re.search(r':.*', ip)[0]
':mango:pineapple:guava:apples:orange'

>>> re.search(r':.*apple', ip)[0]
':mango:pineapple:guava:apple'

For the :.*apple case, the Python regular expression engine actually does consume all the characters on seeing .*. Then realizing that the overall match failed, it gives back one character from the end of line and checks again. This process is repeated until a match is found or failure is confirmed. In regular expression parlance, this is called backtracking.

This type of exploring matches to satisfy overall regex also applies to non-greedy quantifiers. .*? will start with zero characters followed by one, two, three and so on until a match is found.

>>> ip = 'fig:mango:pineapple:guava:apples:orange'

>>> re.search(r':.*?', ip)[0]
':'

>>> re.search(r':.*?apple', ip)[0]
':mango:pineapple'

Note that some regex engines like re2 do not use backtracking.

Possessive quantifiers🔗

Until Python 3.10, you had to use alternatives like the third-party regex module for possessive quantifiers. The re module supports possessive quantifiers from Python 3.11 version.

The difference between greedy and possessive quantifiers is that possessive will not backtrack to find a match. In other words, possessive quantifiers will always consume every character that matches the pattern on which it is applied. Syntax wise, you need to append + to greedy quantifiers to make it possessive, similar to adding ? for non-greedy case.

Unlike greedy or non-greedy quantifiers, :.*+apple will never match, because .*+ will consume rest of the line, leaving no way to match apple.

$ python3.11 -q
>>> import re

>>> ip = 'fig:mango:pineapple:guava:apples:orange'

>>> re.search(r':.*+', ip)[0]
':mango:pineapple:guava:apples:orange'

>>> bool(re.search(r':.*+apple', ip))
False

Here's a more practical example. Suppose you want to match integer numbers greater than or equal to 100 where these numbers can optionally have leading zeros.

>>> numbers = '42 314 001 12 00984'

# this solution fails because 0* and \d{3,} can both match leading zeros
# and greedy quantifiers will give up characters to help overall regex succeed
>>> re.findall(r'0*\d{3,}', numbers)
['314', '001', '00984']

# here 0*+ will not give back leading zeros after they are consumed
>>> re.findall(r'0*+\d{3,}', numbers)
['314', '00984']

# workaround if possessive quantifiers are not supported
>>> re.findall(r'0*[1-9]\d{2,}', numbers)
['314', '00984']

>>> lines = ['#comment', 'c = "#"', '\t #comment', 'abc', '', ' \t ']

# this solution fails because \s* can backtrack
# and [^#] can match a whitespace character as well
>>> [e for e in lines if re.match(r'\s*[^#]', e)]
['c = "#"', '\t #comment', 'abc', ' \t ']

# this works because \s*+ will not give back any whitespace characters
>>> [e for e in lines if re.match(r'\s*+[^#]', e)]
['c = "#"', 'abc']

# workaround if possessive quantifiers are not supported
>>> [e for e in lines if re.match(r'\s*[^#\s]', e)]
['c = "#"', 'abc']

Atomic grouping🔗

(?>pat) is an atomic group, where pat is the pattern you want to safeguard from further backtracking by isolating it from other parts of the regex.

Here's an example with greedy quantifier:

>>> numbers = '42 314 001 12 00984'

# 0* is greedy and the (?>) grouping prevents backtracking
# same as: re.findall(r'0*+\d{3,}', numbers)
>>> re.findall(r'(?>0*)\d{3,}', numbers)
['314', '00984']

Here's an example with non-greedy quantifier:

>>> ip = 'fig::mango::pineapple::guava::apples::orange'

# this matches from the first '::' to the first occurrence of '::apple'
>>> re.search(r'::.*?::apple', ip)[0]
'::mango::pineapple::guava::apple'

# '(?>::.*?::)' will match only from '::' to the very next '::'
# '::mango::' fails because 'apple' isn't found afterwards
# similarly '::pineapple::' fails
# '::guava::' succeeds because it is followed by 'apple'
>>> re.search(r'(?>::.*?::)apple', ip)[0]
'::guava::apple'

The regex module has a regex.REVERSE flag to match from right-to-left making it better suited than atomic grouping for certain cases.

>>> import regex

>>> ip = 'fig::mango::pineapple::guava::apples::orange'
>>> regex.search(r'(?r)::.*?::apple', ip)[0]
'::guava::apple'

# this won't be possible with just atomic grouping
>>> ip = 'and this book is good and those are okay and that movie is bad'
>>> regex.search(r'(?r)th.*?\bis bad', ip)[0]
'that movie is bad'

Catastrophic Backtracking🔗

Backtracking can become significantly time consuming for certain corner cases. Which is why some regex engines do not use them, at the cost of not supporting some features like lookarounds. If your application accepts user defined regex, you might need to protect against such catastrophic patterns. From wikipedia: ReDoS:

A regular expression denial of service (ReDoS) is an algorithmic complexity attack that produces a denial-of-service by providing a regular expression and/or an input that takes a long time to evaluate. The attack exploits the fact that many regular expression implementations have super-linear worst-case complexity; on certain regex-input pairs, the time taken can grow polynomially or exponentially in relation to the input size. An attacker can thus cause a program to spend substantial time by providing a specially crafted regular expression and/or input. The program will then slow down or becoming unresponsive.

Here's an example:

>>> from timeit import timeit

>>> greedy = re.compile(r'(a+|\w+)*:')
>>> possessive = re.compile(r'(a+|\w+)*+:')

# string that'll match the above patterns
>>> s1 = 'aaaaaaaaaaaaaaaa:123'
# string that does NOT match the above patterns
>>> s2 = 'aaaaaaaaaaaaaaaa-123'

# no issues when input string has a match
>>> timeit('greedy.search(s1)', number=10000, globals=globals())
0.016464739997900324
>>> timeit('possessive.search(s1)', number=10000, globals=globals())
0.016358205997676123

# if input doesn't match, greedy version suffers from catastrophic backtracking
# note that 'number' parameter is reduced to 10 since it takes a long time
>>> timeit('greedy.search(s2)', number=10, globals=globals())
53.71723825200024
>>> timeit('possessive.search(s2)', number=10, globals=globals())
0.00019008600065717474

(a+|\w+)*: is a silly regex pattern, since it can be rewritten as \w*: which will not suffer from catastrophic backtracking. But this example shows how quantifiers applied to a group with multiple alternatives using quantifiers can lead to explosive results. More such patterns and mitigation strategies can be found in the following links:

Vim tip 8: join lines

2022-05-04T00:00:00+00:00

In Normal mode, you can join lines using J and gJ commands. These differ in how the end-of-line character and indentation at the start of lines being joined are handled.

J joins the current line and the next line
- the deleted <EOL> character is replaced with a space (unless there are trailing spaces or the next line starts with a ) character)
- indentation from the lines being joined are removed, except the current line
3J joins the current line and next two lines with one space in between the lines
gJ joins the current line and the next line
- <EOL> character is deleted (space character won't be added)
- indentation won't be removed

joinspaces, cpoptions and formatoptions settings will affect the behavior of these commands. See :h J and scroll down for more details.

Video demo:

CLI tip 9: awk paragraph mode

2022-04-27T00:00:00+00:00

awk provides a handy shortcut to process input content paragraph wise. When RS is set to empty string, one or more consecutive empty lines is used as the input record separator. Consider the below sample file:

$ cat para.txt
hi there
how are you

2 apples
12 bananas


blue sky
yellow sun
brown earth

Here are some simple examples to filter paragraphs based on some criteria:

# paragraphs containing 'sun'
$ awk -v RS= '/sun/' para.txt
blue sky
yellow sun
brown earth

# paragraphs containing any digit character
$ awk -v RS= '/[0-9]/' para.txt
2 apples
12 bananas

# print the first paragraph
$ awk -v RS= 'NR==1' para.txt
hi there
how are you

See Paragraph mode section from my GNU awk ebook for more examples and corner cases.

Video demo:

See my CLI text processing with GNU awk ebook if you are interested in learning about the GNU awk command in more detail.

Python tip 9: applying set-like operations for dictionaries

2022-04-16T00:00:00+00:00

You can merge two dictionaries using the | operator (similar to union of sets). If a key is found in both the dictionaries, the insertion order of the first dictionary will be maintained, but the value of the second dictionary will be used. In other words, keys are updated to the new value during the merge.

>>> marks_1 = {'Rahul': 86, 'Ravi': 92, 'Rohit': 75}
>>> marks_2 = {'Jo': 89, 'Rohit': 78, 'Joe': 75, 'Ravi': 100}

# use unpacking, i.e. {**d1, **d2} for Python 3.8 and below versions
>>> marks_1 | marks_2
{'Rahul': 86, 'Ravi': 100, 'Rohit': 78, 'Jo': 89, 'Joe': 75}

Use update() method if you want to modify instead of getting a new dictionary.

>>> marks_1.update(marks_2)
>>> marks_1
{'Rahul': 86, 'Ravi': 100, 'Rohit': 78, 'Jo': 89, 'Joe': 75}

The keys() and values() dictionary methods return set-like objects, but with insertion order maintained. You get a set object as output when you apply set operators on these objects.

>>> marks_1 = {'Rahul': 86, 'Ravi': 92, 'Rohit': 75}
>>> marks_2 = {'Jo': 89, 'Rohit': 78, 'Joe': 75, 'Ravi': 100}

# union of keys
>>> marks_1.keys() | marks_2.keys()
{'Rohit', 'Rahul', 'Ravi', 'Jo', 'Joe'}

# common keys
>>> marks_1.keys() & marks_2.keys()
{'Ravi', 'Rohit'}

# difference: keys not present in the other dict
>>> marks_1.keys() - marks_2.keys()
{'Rahul'}
>>> marks_2.keys() - marks_1.keys()
{'Jo', 'Joe'}

# symmetric difference: union of above two differences
>>> marks_1.keys() ^ marks_2.keys()
{'Jo', 'Joe', 'Rahul'}

Video demo:

See also my 100 Page Python Intro ebook.

Vim tip 7: changing case in Normal mode

2022-04-12T00:00:00+00:00

You can use the following commands to change the case of characters:

~ invert the case of the character under the cursor (i.e. lowercase becomes UPPERCASE and vice versa)
g~ followed by motion inverts the case of those characters
- for example: g~e, g~$, g~iw, etc
gu followed by motion changes those characters to lowercase
- for example: gue, gu$, guiw, etc
gU followed by motion changes those characters to UPPERCASE
- for example: gUe, gU$, gUiw, etc

You can also provide a count prefix to these commands. For example, 3~ will invert the case of the current character and two characters to the right.

Video demo:

CLI tip 8: extract from start of file until matching line

2022-04-06T00:00:00+00:00

The GNU sed command has a couple of handy commands to extract text from the start of input until a matching line is found. The q and Q commands are similar, except how they process the matching line.

The q command will exit sed immediately, after printing the current pattern space if applicable.

# quit after a line containing 'st' is found
$ printf 'apple\nsea\neast\ndust' | sed '/st/q'
apple
sea
east

The Q command is similar to q but won't print the matching line.

# matching line won't be printed in this case
$ printf 'apple\nsea\neast\ndust' | sed '/st/Q'
apple
sea

tac+sed+tac will help you get lines starting from the last occurrence of the search string till the end of the input.

$ printf 'apple\nsea\neast\ndust\n' | tac | sed '/ea/q' | tac
east
dust

Be careful if you want to use q or Q commands with multiple files, as sed will stop even if there are other files left to be processed. You can use mixed address ranges as a workaround. See also unix.stackexchange: applying q to multiple files.

Video demo:

See my CLI text processing with GNU sed ebook if you are interested in learning about the GNU sed command in more detail.

Vim Reference Guide: two week sales report

2022-03-31T00:00:00+00:00

I've previously written about events and strategies that led to increased ebook sales during the last quarter of 2021.

Very pleased to inform that I continue to see more than expected sales. I had released my 12th ebook Vim Reference Guide on March 15th. Here's how the sales looked on Gumroad during the first two weeks:

I used to offer my ebooks for free on release. For the past couple of releases, I have also added heavily discounted ebook bundles which seems to be the major factor in increased paid sales I'm seeing. Luck certainly plays a role too, reaching front page of Hacker News and top of subreddits cannot be always counted upon. Here are some of the ways I promoted my latest ebook:

Announcement post on Gumroad and sending an email to existing readers
Show HN post on Hacker News, got lucky to be placed in top 10
Pinned tweet
Posting on /r/commandline/ and /r/linux/
- You might wonder why not /r/vim? Somebody else posted before I could, and unfortunately it got downvoted. I'll probably make my own post after I release the next version
Promo video on youtube
Mentioned in two of my learnbyexample weekly newsletter issues
And of course, I wrote a release post on this blog and also mentioned it on my GitHub Readme

Apart from Gumroad, 500+ readers downloaded the guide from Leanpub and I got a few paid sales as well. I wrote about pros and cons of Gumroad/Leanpub here.

PS: Make sure to read the rules and be a regular user before self-promoting your content on the social media platforms mentioned above.

Python tip 8: dict.fromkeys() method

2022-03-30T00:00:00+00:00

A lesser known way to create a dictionary is to use the fromkeys() method that accepts an iterable and an optional value (default is None). The same value will be assigned to all the keys, so be careful if you want to use a mutable object.

>>> colors = ('red', 'blue', 'green')

>>> dict.fromkeys(colors)
{'red': None, 'blue': None, 'green': None}

>>> dict.fromkeys(colors, 255)
{'red': 255, 'blue': 255, 'green': 255}

When you iterate over a dictionary object, you'll get only the keys. For example:

>>> fruits = {'banana': 12, 'papaya': 5, 'mango': 10}

>>> for fruit in fruits:
...     print(fruit)
... 
banana
papaya
mango

Recent Python versions ensure that the insertion order is maintained for a dictionary. So, you can remove duplicate items from a list while maintaining the order by building a dictionary using the fromkeys() method and converting it back to a list.

>>> nums = [1, 4, 6, 22, 3, 5, 4, 3, 6, 2, 1, 51, 3, 1]

# remove duplicates, if you don't care about the element order
>>> list(set(nums))
[1, 2, 3, 4, 5, 6, 51, 22]

# remove duplicates, if you want to maintain the element order
>>> list(dict.fromkeys(nums))
[1, 4, 6, 22, 3, 5, 2, 51]

Video demo:

See also my 100 Page Python Intro ebook.

Vim tip 6: search word nearest to the cursor

2022-03-24T00:00:00+00:00

Vim provides handy commands to match words under (or near to) the cursor. You can choose to match whole or part of a longer word. If a match is found, the cursor will move to the next match in the chosen direction.

* searches the word nearest to the cursor in the forward direction (matches only the whole word)
- Shift followed by left mouse click can also be used if mouse is enabled
g* searches the word nearest to the cursor in the forward direction (matches as part of another word as well)
- for example, if you apply this command on the word the, you'll also get matches for them, lather, etc
# searches the word nearest to the cursor in the backward direction (matches only the whole word)
g# searches the word nearest to the cursor in the backward direction (matches as part of another word as well)

From :h word:

A word consists of a sequence of letters, digits and underscores, or a sequence of other non-blank characters, separated with white space (spaces, tabs, <EOL>). This can be changed with the iskeyword option.

Sequence of non-blank characters will be used only at the end of the line. Otherwise, the above commands will use the next word found on that line which is made up of letters, digits and underscores.

You can also provide a count prefix to these commands. For example, 2* will take you to the second match in the forward direction.

Video demo:

CLI tip 7: limiting number of filtered lines

2022-03-16T00:00:00+00:00

grep supports -m option to specify the maximum number of matching lines in the output.

# all input lines containing 'a'
$ printf 'goal\nrate\neat\npit\n' | grep 'a'
goal
rate
eat

# maximum of 2 matching lines
$ printf 'goal\nrate\neat\npit\n' | grep -m2 'a'
goal
rate
$ printf 'goal\nrate\neat\npit\n' | grep -m2 'pi'
pit

# example with -v option
$ printf 'goal\nrate\neat\npit\n' | grep -v 'e'
goal
pit
$ printf 'goal\nrate\neat\npit\n' | grep -v -m1 'e'
goal

With multiple file input, the restriction is applied for each file separately.

$ cat table.txt 
brown bread mat cake 42
blue cake mug shirt -7
yellow banana window shoes 3.14
$ printf 'goal\nrate\neat\npit\n' > ip.txt

$ grep -m1 'i' table.txt ip.txt
table.txt:blue cake mug shirt -7
ip.txt:pit

# use 'cat' if you want to operate on combined input
$ cat table.txt ip.txt | grep -m1 'i'
blue cake mug shirt -7
$ cat table.txt ip.txt | grep -m1 'go'
goal

Video demo:

See my CLI text processing with GNU grep and ripgrep ebook if you are interested in learning about GNU grep and ripgrep commands in more detail.

Python tip 7: creating a deepcopy of collections

2022-03-09T00:00:00+00:00

From copy built-in module:

Assignment statements in Python do not copy objects, they create bindings between a target and an object. For collections that are mutable or contain mutable items, a copy is sometimes needed so one can change one copy without changing the other.

The shared binding is helpful for cases like in-place modification of lists within a user defined function:

>>> def rotate(ip):
...     ip.insert(0, ip.pop())
... 
>>> nums = [321, 1, 1, 0, 5.3, 2]
>>> rotate(nums)
>>> nums
[2, 321, 1, 1, 0, 5.3]

You can use the copy.deepcopy() method if you wish to recursively create new copies of all the elements of a mutable object:

>>> import copy

>>> nums_2d = [[1, 3, 2, 10], [1.2, -0.2, 0, 2], [100, 200]]
>>> nums_2d_deepcopy = copy.deepcopy(nums_2d)

>>> nums_2d_deepcopy[0][0] = 'yay'

>>> nums_2d_deepcopy
[['yay', 3, 2, 10], [1.2, -0.2, 0, 2], [100, 200]]
>>> nums_2d
[[1, 3, 2, 10], [1.2, -0.2, 0, 2], [100, 200]]

Video demo:

See Mutability chapter from my 100 Page Python Intro ebook for more details on this topic.

Vim tip 5: jumping back and forth in Normal mode

2022-03-02T00:00:00+00:00

Find yourself working on a large file? Or perhaps handling multiple buffers? Vim makes it easy to navigate previous locations:

Ctrl+o navigate to the previous location in the jump list (think o as old)
Ctrl+i navigate to the next location in the jump list (i and o are usually next to each other)
g; go to the previous change location
g, go to the newer change location
gi place the cursor at the same position where it was left last time in the Insert mode

Use :jumps to view the jump list. See :h jump-motions for more details.

Video demo:

CLI tip 6: filtering lines based on multiple conditions

2022-02-23T00:00:00+00:00

Usually, grep is used for filtering lines based on a regexp pattern. Matching multiple patterns as a conditional OR is easy to apply using alternation. For example, grep -E 'bread|banana' matches lines containing either bread or banana.

However, conditional AND requires multiple grep commands or the use of lookarounds if PCRE feature is available. A simpler alternative is to use awk.

$ cat table.txt
brown bread mat cake 42
blue cake mug shirt -7
yellow banana window shoes 3.14

# line containing 'cake' but not 'at'
# same as: grep 'cake' table.txt | grep -v 'at'
# with PCRE: grep -P '^(?!.*at).*cake' table.txt
$ awk '/cake/ && !/at/' table.txt
blue cake mug shirt -7

# first field containing 'low' or the last field is less than 0
# not easy to construct a grep solution here due to field/numeric comparison
$ awk '$1 ~ /low/ || $NF<0' table.txt
blue cake mug shirt -7
yellow banana window shoes 3.14

Video demo:

See my CLI text processing with GNU awk ebook if you are interested in learning about the awk command in more detail.

Python tip 6: inplace file editing

2022-02-15T00:00:00+00:00

The built-in fileinput module has nice features for file processing, especially handy for multiple files and inplace editing. Here's an example:

# inplace.py
import fileinput

with fileinput.input(inplace=True) as f:
    for line in f:
        # do some processing
        op = line.replace('search', 'replace')
        # print() the text you want to write back to the input files
        print(op, end='')

By default, files passed as CLI arguments will be processed:

$ python3 inplace.py *.md

If you already know which inputs have to be processed, use the files argument. Use the backup argument if you want to make copies of original files in case something goes wrong. See my blog post In-place file editing with fileinput module for more details and examples.

Video demo:

See my 100 Page Python Intro ebook for a short, introductory guide for the Python programming language.

Vim tip 4: reposition current line in Normal mode

2022-02-09T00:00:00+00:00

You're likely to be familiar with commands to scroll through the contents like Ctrl followed by d or f or u or b.

Did you know that Vim also has handy options to keep the cursor on the current line while moving the contents around?

zz reposition the current line to the middle of the visible window
- useful to see context around lines that are nearer to the top/bottom of the visible window
zt reposition the current line to the top of the visible window
zb reposition the current line to the bottom of the visible window

See :h 'scrolloff' option if you want to always show context around the current line.

Video demo:

PyDev of the Week

2022-02-02T00:00:00+00:00

Last month I had the wonderful opportunity to be part of PyDev of the Week series organized by Michael Driscoll.

It was a pleasure to walk down the memory lane for this interview: https://www.blog.pythonlibrary.org/2022/01/31/pydev-of-the-week-sundeep-agarwal/

I got to discuss about my education, career, writing ebooks and more. Hope you enjoy the interview!

CLI tip 5: aligning columns

2022-02-02T00:00:00+00:00

The column command is a nifty tool to align input data column wise. By default, whitespace is used as the input delimiter. Space character is used to align the output columns, so whitespace characters like tab will get converted to spaces.

$ printf 'one two three\nfour five six\n'
one two three
four five six

$ printf 'one two three\nfour five six\n' | column -t
one   two   three
four  five  six

You can use the -s option to customize the input delimiter. Note that the output delimiter will still be made up of spaces only.

$ cat scores.csv
Name,Maths,Physics,Chemistry
Ith,100,100,100
Cy,97,98,95
Lin,78,83,80

$ column -s, -t scores.csv
Name  Maths  Physics  Chemistry
Ith   100    100      100
Cy    97     98       95
Lin   78     83       80

$ printf '1:-:2:-:3\napple:-:banana:-:cherry\n' | column -s:-: -t
1      2       3
apple  banana  cherry

Input should have a newline at the end, otherwise you'll get an error:

$ printf '1 2 3\na   b   c' | column -t
column: line too long
1  2  3

Video demo:

See my Linux Command Line Computing ebook and man column for more details.

Python tip 5: random choice and sample

2022-01-25T00:00:00+00:00

Here are a couple of commonly used methods for the built-in random module:

choice() method helps you get a random element
sample() method helps you get a list of a specific count of random elements

>>> import random
>>> nums = [1, 4, 5, 2, 51, 3, 6, 22]

>>> random.choice(nums)
3

>>> random.sample(nums, k=4)
[51, 2, 3, 1]

>>> random.sample(range(1000), k=5)
[490, 26, 9, 745, 919]

Both these methods will work on any sequence object. The sample() method also accepts a set object, but that will be deprecated.

Video demo:

See my 100 Page Python Intro ebook for a short, introductory guide for the Python programming language.

Brag post: Hacker News Front Page entries

2022-01-21T00:00:00+00:00

In case you haven't yet read this nice post "Get your work recognized: write a brag document" by Julia Evans, please do that first.

I definitely found it nice to collect which of my content have reached Hacker News front page over the past 4 years. As I wrote in my book writing experience post, the responses I got for my GNU awk one-liners collection was one of the stepping stones towards my career as a technical author.

Here's the list so far, ordered by oldest first:

Learn to use Awk with hundreds of examples — 478 points, Oct 2017, 116 comments
Show HN: I wrote a book on GNU grep and ripgrep — 182 points, June 2019, 53 comments
Show HN: I wrote a book on Python regular expressions — 193 points, Aug 2019, 50 comments
Show HN: An eBook with hundreds of GNU Awk one-liners — 539 points, April 2020, 48 comments
Show HN: Ruby One-Liners Cookbook — 191 points, Sept 2020, 36 comments
Perl One-Liners Cookbook — 126 points, Nov 2020, 47 comments
Show HN: "100 Page Python Intro" eBook — 107 points, Feb 2021, 26 comments
Paying my bills with 'free' ebooks — 85 points, Mar 2021, 22 comments
Show HN: "Command line text processing with GNU Coreutils" eBook — 117 points, Oct 2021, 20 comments
Show HN: Improve your Python regex skills with 75 interactive exercises — 175 points, Nov 2021, 12 comments
Vim prank: alias vim='vim -y' — 341 points, Jan 2022, 259 comments
Vim Reference Guide — 244 points, Mar 2022, 110 comments
Show HN: Interactive exercises for Linux CLI text processing commands — 69 points, Dec 2022, 7 comments
CLI text processing with GNU awk — 419 points, Aug 2023, 129 comments

As is the case with other social media platforms, being an active participant on Hacker News definitely helps. Apart from commenting on other topics, I also post links to projects and resources that I felt were useful. The number of such links reaching front page outnumbers my own content links.

Removing duplicates irrespective of field order

2022-01-19T00:00:00+00:00

I posted a coding challenge in the tenth issue of learnbyexample weekly. I discuss the problem and various solutions in this blog post.

Problem statement🔗

Retain only the first copy of duplicate lines irrespective of the order of the fields. Input order should be maintained. Assume space as the field separator with exactly two fields on each line. For example, hehe haha and haha hehe will be considered as duplicates.

$ cat twos.txt
hehe haha
door floor
haha hehe
6;8 3-4
true blue
hehe bebe
floor door
3-4 6;8
tru eblue
haha hehe

Expected output for the above sample:

hehe haha
door floor
6;8 3-4
true blue
hehe bebe
tru eblue

Python solution🔗

Here's one possible solution for this problem:

filename = 'twos.txt'
keys = set()

with open(filename) as f:
    for line in f:
        fields = line.split()
        key1 = f'{fields[0]} {fields[1]}'
        key2 = f'{fields[1]} {fields[0]}'
        if not (key1 in keys or key2 in keys):
            print(line, end='')
            keys.add(key1)

The main trick in the above solution is to check the input field order as well as the reversed order against elements in a set. A subtle point to note is that the split() string method also removes whitespaces from the start and end of the input line. If you had to use another field delimiter (for example, comma) you'll have to remove the line ending before splitting the input.

And here's a generic solution for any number of fields, which also makes the solution look simpler:

filename = 'twos.txt'
keys = set()

with open(filename) as f:
    for line in f:
        fields = line.split()
        sorted_key = ' '.join(sorted(fields))
        if sorted_key not in keys:
            print(line, end='')
            keys.add(sorted_key)

In case you are wondering why space is used to join the field contents, it is necessary to avoid false matches. tru eblue shouldn't be considered as a duplicate of true blue or blue true. Space is a safe character to use since it is the field separator.

See my 100 Page Python Intro ebook if you already know programming basics but new to Python.

GNU awk one-liner🔗

Here's a solution for CLI enthusiasts:

$ awk '!(($1,$2) in seen || ($2,$1) in seen); {seen[$1,$2]}' twos.txt
hehe haha
door floor
6;8 3-4
true blue
hehe bebe
tru eblue

The above solution is similar to the first Python solution with a notable difference. The fields are joined using \034 (a non-printing character), which is usually not present in text files.

A solution using the field separator instead of \034 would look like:

awk '!(($1 FS $2) in seen || ($2 FS $1) in seen); {seen[$1 FS $2]}'

See my CLI text processing with GNU awk ebook if you are interested in such one-liners.

Vim tip 3: autocomplete words and lines in Insert mode

2022-01-18T00:00:00+00:00

Autocomplete word

Ctrl+p autocomplete word based on matching words in the backward direction
Ctrl+n autocomplete word based on matching words in the forward direction

If more than one word matches, they are displayed using a popup menu. You can use ↑/↓ arrow keys or Ctrl+p/Ctrl+n to move through this list.

With multiple matches, you'll notice that the first match is automatically inserted and moving through the list doesn't change the text that was inserted. You'll have to press Ctrl+y or Enter key to choose a different completion text. If you were satisfied with the first match, typing any character will make the popup menu disappear and insert whatever character you had typed. Or press Esc to select the first match and go to Normal mode.

Autocomplete line

Ctrl+x followed by Ctrl+l autocomplete line based on matching lines in the backward direction

If more than one line matches, they are displayed using a popup menu. You can use ↑/↓ arrow keys or Ctrl+p/Ctrl+n to move through this list. You can also use Ctrl+l to move up the list.

Autocomplete assist

Ctrl+e cancels autocomplete
- you'll retain the text you had typed before invoking autocomplete
Ctrl+y or Enter change the autocompletion text to the currently selected item from the popup menu

See :h ins-completion for more details and other autocomplete features.

Video demo:

Regexp gotcha 1: grouping common portions

2022-01-14T00:00:00+00:00

Similar to a(b+c)d = abd+acd in maths, you get a(b|c)d = abd|acd in regular expressions. However, you'll have to be careful if quantifiers are involved.

For example, (a*|b*) isn't the same as (a|b)*. Can you reason out why? Here's a railroad diagram to help you out:

Credit: debuggex.com

The difference is that (a*|b*) only matches same letter sequences like a, bb, aaaaaa, etc. But (a|b)* can match mixed sequences like ababbba too. You can also simplify (a|b)* to [ab]* since it is just single character alternation in this particular example.

Here's an illustration using Python:

>>> import re

>>> test = ['aa', 'abbaba', 'aaabbb', 'bbbbb', 'abc']

>>> [s for s in test if re.fullmatch(r'(a*|b*)', s)]
['aa', 'bbbbb']

>>> [s for s in test if re.fullmatch(r'(a|b)*', s)]
['aa', 'abbaba', 'aaabbb', 'bbbbb']

Want to learn regular expressions from the basics with plenty of examples and exercises? I've written regexp ebooks for Python, JavaScript, Ruby and CLI tools.

CLI tip 4: serialize file contents to a single line

2022-01-12T00:00:00+00:00

The -s option is one of the useful, but lesser known feature of the paste command. It helps you to serialize input file contents to a single output line.

$ cat colors.txt
blue
white
orange

$ paste -sd, colors.txt
blue,white,orange

If multiple files are passed, serialization of each file is displayed on separate output lines.

$ paste -sd: <(seq 3) <(seq 5 9)
1:2:3
5:6:7:8:9

The advantage of using paste instead of other options like tr, awk, etc is that you do not have to worry about trailing delimiters, newlines, etc. For example:

# issue 1: trailing comma
# issue 2: no newline at the end
$ <colors.txt tr '\n' ','
blue,white,orange,

# correcting the above two issues
$ <colors.txt tr '\n' ',' | sed 's/,$/\n/'
blue,white,orange

Here's an equivalent awk solution for single file input. While slower and complicated compared to the paste solution, you get more flexibility since awk is a programming language. For example, it is pretty easy to use multicharacter output delimiter.

$ awk -v ORS= 'NR>1{print ","} 1; END{print "\n"}' colors.txt
blue,white,orange

$ awk -v ORS= 'NR>1{print " : "} 1; END{print "\n"}' colors.txt
blue : white : orange

Video demo:

See paste command chapter from my Command line text processing with GNU Coreutils ebook for more details.

See my CLI text processing with GNU awk ebook if you are interested in learning about the awk command.

Automating Excel with Python - book review

2022-01-11T00:00:00+00:00

In this post, I review Automating Excel with Python by Michael Driscoll. From the introduction chapter of this book:

The purpose of this book is to help you learn how to use Python to work with Excel. You will be using a package called OpenPyXL to create, read, and edit Excel documents with Python. While the focus of this book will be on OpenPyXL, you will also learn about other Python packages that you can use to interact with Excel using the Python programming language.

Book details🔗

Book cover

Amazon — Paperback, Kindle
Gumroad — PDF, EPUB, Mobi
Leanpub — PDF, EPUB, Mobi
GitHub — code examples and sample spreadsheets used in the book
Goodreads — book reviews

Review🔗

My very first job assignment (at a semiconductor company) required me to use spreadsheets for tabulating results of various experiments, adding charts, etc. I used to manually copy-paste the results generated from a Perl script. There were multiple sheets and my work was complicated enough to require multiple months of refinement, feature modifications, etc. Not sure if a library like OpenPyXL existed back then, but I think I should've at least asked/searched ways to automate the spreadsheet process.

Going through this book felt like someone wrote a book just for that project, albeit 13 years late. Here's a rough list of features that would've helped me:

Creating xlsx files with multiple sheets
Adding data
Formatting cells based on a known equation
Creating charts

Instructions and examples were clear and easy to follow. Snapshots were also shown for all the examples, so you can check if you've followed along as expected. While the book is best suited if you have MS Excel, most of the examples worked for me on LibreOffice Calc. Only the charts had major differences — some types weren't supported and x/y axis label/data were problematic as shown below:

Bar Chart in Excel (snapshot from the book)

Bar Chart in LibreOffice Calc (what I got on my machine)

Apart from the openpyxl module, the author also briefly covered how you can use pandas, xlsxwriter and gspread (for working with Google sheets). Some features were presented at the end as Appendix chapters.

Table of Contents🔗

Introduction
Chapter 1 - Setting Up Your Machine
Chapter 2 - Reading Spreadsheets with OpenPyXL
Chapter 3 - Creating a Spreadsheet with OpenPyXL
Chapter 4 - Styling Cells
Chapter 5 - Conditional Formatting
Chapter 6 - Creating Charts
Chapter 7 - Chart Types
Chapter 8 - Converting CSV to Excel
Chapter 9 - Using Pandas with Excel
Chapter 10 - Python and Google Sheets
Chapter 11 - XlsxWriter
Appendix A - Cell Comments
Appendix B - Print Settings Basics
Appendix C - Formulas

Feedback and Reviews🔗

All in all, I would highly recommend this book for those wanting to use Python for automating spreadsheets. I'd request you to post reviews after going through the book (they help us indie authors a lot). And please do contact the author to let him know your feedback or if you have any clarifications.

Happy learning :)

Vim prank: alias vim='vim -y'

2022-01-07T00:00:00+00:00

Poster created using Canva

While going through :h vim-arguments for my Vim Reference Guide ebook, I came across the -y option:

Easy mode. Implied for evim and eview. Starts with 'insertmode' set and behaves like a click-and-type editor. This sources the script $VIMRUNTIME/evim.vim. Mappings are set up to work like most click-and-type editors, see evim-keys. The GUI is started when available.

It was so weird to use. Copy and paste works with Ctrl+c and Ctrl+v respectively. Text can be selected with mouse and typing new text overwrites this selected portion. Esc key doesn't work (gasp!), so I couldn't quit until I used the window buttons. Later I tried and found that Ctrl+o works, which would then allow you to use :q as usual.

So, if you want to prank a Vim user:

alias vim='vim -y'

I didn't expect such a good response on /r/vim/ and twitter for this "easy" feature. So, decided to write this mini blog post as well. Also, I got to know a few more ways to escape this madness from the /r/vim/ sub:

One hint: If you want to go to Normal mode to be able to type a sequence of commands, use CTRL-L. https://vimhelp.org/starting.txt.html#evim-keys

Use <c-\><c-n> See :h CTRL-\_CTRL-N

Update

So, this post reached front page on Hacker News! Plenty of interesting comments and got to know about novim-mode plugin (which aims to make Vim behave more like a 'normal' editor).

I also found an old discussion on /r/vim/ discussing ways to trick a Vim user.

Python tip 4: comparison chaining

2022-01-04T00:00:00+00:00

You can chain comparison operators arbitrarily. Apart from terser code, this also has the advantage of having to evaluate the middle expression only once.

>>> from math import factorial

# factorial function gets called twice for this example
>>> factorial(3) > 2 and factorial(3) < 10
True

# function needs to be called only once here
>>> 2 < factorial(3) < 10
True

# another example
>>> 'bat' < 'cat' < 'cater'
True

Video demo:

See my 100 Page Python Intro ebook for a short, introductory guide for the Python programming language.

2021 was a wild ride

2021-12-30T00:00:00+00:00

TL;DR: Started and ended the year well, with a depressing period in the middle. Published three programming ebooks, several blog posts, started a newsletter, improved Twitter readership, read 80+ novels, and so on. Had a good year in terms of ebook sales 😇

Books published🔗

100 Page Python Intro — short, introductory guide for the Python programming language. Started writing last year, published in February
Practice Python Projects — five beginner to intermediate level projects inspired by real world use cases. Started writing last year (before "100 Page Python Intro"!), published in July
Command line text processing with GNU Coreutils — learn 20+ specialized text processing tools provided by the GNU coreutils package. Published in October

I also spent time updating all my existing books from February to May.

Workshops🔗

First and only workshop I conducted since the start of pandemic in 2020. And this was possible only because it was online. The topic was Python scripting introduction for Biotech students. Publishing "100 Page Python Intro" was timely for this workshop.

This took up most of my time during March/April along with updating existing books.

Blog posts🔗

I've been consistently writing books for the past three years, but I find it difficult to come up with ideas for my programming blog. This is partly due to not wanting to repeat content from my books. Here's my favorite posts I wrote this year:

I tried to be more consistent by posting short articles (see mini posts list), but lost interest. Starting a newsletter in November helped change my perspective about re-using content from my books. I've started posting tips and coding challenges that are short and easy to digest:

I was more consistent for my Escapist Reviews blog that I started late last year to review novels I read.

Book sales🔗

Had better sales compared to last year, which I really wasn't expecting. Especially when the average monthly sales was around $100 between May to September (my monthly expenses is around $150). This coincided with some health issues and the struggle to finish writing the "Practice Python Projects" book.

This led me to reading articles about better landing pages, building audience on social media, affiliates, etc. I still have a long way to go, but I feel these active efforts led to much improved sales in the last quarter of the year. I ended up deciding not to use affiliates though.

Here's my sales chart from Gumroad for this year (I had similar revenue from Leanpub):

There were plenty of reasons that led to the awesome last quarter sales. Here's some significant events I remember:

Joined hands with fellow Python authors for The Indie Python Extravaganza bundle (given away freely for a month)
- A Twitter discussion led to the giveaway idea, which resulted in creating this bundle
- Combined marketing efforts by all four of us gave significant paid sales too
Published "Command line text processing with GNU Coreutils"
- In addition to my usual practice of making a new book free, this time I offered All books bundle for $5 and a lot of users bought it
- Announcing the book on Reddit and Hacker News was well received
- I was beginning to improve my Twitter audience around that time, which helped a bit
- Got featured in Leanpub's monthly sales newsletter
- Jesse Smith on distrowatch.com wrote a lovely book review, which resulted in significant sales in December
Getting featured on Ruby weekly
Programming deals for the last week of November
- Helped a lot by commenting on Hacker News and getting featured in blog posts of fellow Python authors
Interactive GUI app for Python regex
- As part of 50 days of break from book writing, I worked on this Python app
- Made it to the front page of Hacker News yet again
"The Indie Python Extravaganza" bundle and some of my other books were featured in Leanpub's Boxing day sales
And I believe creating GitHub Readme helped as well

The biggest takeaway for me was to actively look for opportunities (small or big) instead of just relying on free offering during book launch (which is about once in four months).

During the 50 days break, the other significant project I started was learnbyexample weekly newsletter. This is still in early stages to point out any impact it will have on my book sales, but it certainly has been a pleasure so far to email an issue every Friday.

And as mentioned earlier, this led me to write programming blog posts consistently (tips and coding challenges).

Building Twitter audience🔗

I joined Twitter in 2015. My follower count was less than 400 in July. In December, I crossed 1100. This is far from being impressive (I know a few authors who added more than 15000 followers during that time period).

Being active on Twitter led me to awesome opportunities mentioned earlier in the Book sales section. The best tips I can give is to tweet consistently, interact with your readers and don't be afraid to participate in conversations initiated by top users. Oh, and reading articles/books about social media audience building would help too.

Follow me on Twitter for interesting tech nuggets 😉

Fictional reading🔗

I enjoy reading fantasy and science-fiction novels. I read 80+ SFF books this year and recently wrote a post listing my top 10 favorites.

I also got a chance to beta read The Siege of Skyhold and an ARC of Bastion. I find these a good way to give back to the writing community, having myself received plenty of support from strangers.

Goals for 2022🔗

Foremost goal is to continue taking care of physical/mental health. And I'd be more than happy if I manage yet another year with $250+ average monthly income.

Books:

I'm currently working on Vim Reference Guide ebook. Likely to publish in the first quarter
I started working on Command line text processing with Rust tools ebook even before "Command line text processing with GNU Coreutils", hope to publish in 2022
Have several more book topics in mind, but not sure if I'll start working on any of them. And it is possible that I'll come up with something else I fancy and work on it instead of already planned topics

Projects:

Interactive apps for exercises from other books, similar to the one I did for Python regex
Games for fun

Miscellaneous:

Continue to build an audience via Twitter, Newsletter, etc
Contribute to other open source projects

Here's wishing you a very happy, healthy and prosperous 2022 👍 😇

Vim tip 2: indent/unindent lines

2021-12-29T00:00:00+00:00

Normal mode

>> indent the current line
3>> indent the current line and two lines below (same as 2>j)
>k indent the current line and the line above (same as 1>k or >1k)
<< unindent the current line
5<< unindent the current line and four lines below (same as 4<j or <4j)
2<k unindent the current line and two lines above (same as <2k)
= auto indent code, use motion commands to indicate the portion to be indented
- =4j auto indents the current line and four lines below
- =ip auto indents the current paragraph

You can use any motion command with > and <. For example, >} indents till the end of the paragraph.

Visual mode

> indent the visually selected lines once
3> indent the visually selected lines three times
< unindent the visually selected lines once
= auto indent code

Consider the following unindented code:

for(i=1; i<5; i++)
{
for(j=i; j<10; j++)
{
statements
}
statements
}

Here's the result after applying vip= (you can also use =ip if you prefer Normal mode).

for(i=1; i<5; i++)
{
    for(j=i; j<10; j++)
    {
        statements
    }
    statements
}

Indentation depends on the shiftwidth setting.

See :h shift-left-right, :h = and :h 'shiftwidth' for documentation.

Video demo:

CLI tip 3: place backups in another directory with GNU sed

2021-12-21T00:00:00+00:00

You can use * to place backups of original files in another directory when using the -i option with GNU sed. Consider these two sample input files in the current directory:

$ cat f1.txt
good morning
that was good, just too good!
$ cat f2.txt 
goodie goodbye

Create a backups directory and use * under this directory as a placeholder for the filenames passed to the sed command.

$ mkdir backups
$ sed -i'backups/*' 's/good/nice/' f1.txt f2.txt
$ ls backups/
f1.txt  f2.txt

# modified content
$ cat f1.txt
nice morning
that was nice, just too good!
$ cat f2.txt
niceie goodbye

# backed-up original content
$ cat backups/f1.txt 
good morning
that was good, just too good!
$ cat backups/f2.txt
goodie goodbye

Since * expands to the name of the input files, you can also use this feature when you need to add a prefix for the backups.

$ sed -i'bkp.*' 's/green/yellow/' colors.txt

$ ls *colors*
bkp.colors.txt  colors.txt

The * trick works with Perl as well, see In-place file editing chapter from my Perl One-Liners Guide ebook for examples.

Video demo:

See my CLI text processing with GNU sed ebook if you are interested in learning about the GNU sed command in more detail.

Counting nested braces

2021-12-15T00:00:00+00:00

I posted a coding challenge in the fifth issue of learnbyexample weekly. I discuss the problem and Python/Perl solutions in this blog post.

Problem statement🔗

Write a function that returns the maximum nested depth of curly braces for a given string input. For example:

'a*{b+c}' should return 1
'{{a+2}*{{b+{c*d}}+e*d}}' should return 4
unbalanced or wrongly ordered braces like '{{a}*b' and '}a+b{' should return -1

Python solution🔗

Here's one possible solution for this problem:

def max_nested_braces(expr):
    max_count = count = 0
    for char in expr:
        if char == '{':
            count += 1
            if count > max_count:
                max_count = count
        elif char == '}':
            if count == 0:
                return -1
            count -= 1

    if count != 0:
        return -1
    return max_count

In case you have trouble understanding the above code, you can use pythontutor to visualize the code execution step-by-step.

Here's an alternate solution using regular expressions:

import re

def max_nested_braces(expr):
    count = 0
    while True:
        expr, no_of_subs = re.subn(r'\{[^{}]*\}', '', expr)
        if no_of_subs == 0:
            break
        count += 1

    if re.search(r'[{}]', expr):
        return -1
    return count

And if you are a fan of assignment expressions:

import re

def max_nested_braces(expr):
    count = 0
    while (op := re.subn(r'\{[^{}]*\}', '', expr)) and op[1]:
        expr = op[0]
        count += 1

    if re.search(r'[{}]', expr):
        return -1
    return count

I verified these solutions using assert statements. See Testing chapter from my 100 Page Python Intro ebook for more details.

See Working with matched portions chapter from my Understanding Python re(gex)? ebook for more details about the re.subn() function.

Perl one-liner🔗

Here's a solution for CLI enthusiasts:

$ cat ip.txt 
{a+2}*{b+c}
{{a+2}*{{b+{c*d}}+e*d}}
a*b{
{{a+2}*{{b}+{c*d}}+e*d}}

$ perl -lne '$c=0; $c++ while(s/\{[^{}]*\}//g);
             print /[{}]/ ? -1 : $c' ip.txt
1
4
-1
-1

See my Perl One-Liners Guide ebook if you are interested in learning to use Perl from the command-line.

If you are interested in awk and bash solutions, see this unix.stackexchange thread.

Python tip 3: expression and result with f-string

2021-12-14T00:00:00+00:00

In case you haven't yet discovered this awesome f-string feature, you can add = after an expression to get both the expression and the result in the output.

>>> num1 = 42
>>> num2 = 7

>>> f'{num1 + num2 = }'
'num1 + num2 = 49'
>>> f'{num1 + (num2 * 10) = }'
'num1 + (num2 * 10) = 112'

I use it often to quickly test a function:

>>> def isodd(n):
...     return bool(n % 2)
... 
>>> print(f'{isodd(42) = }')
isodd(42) = False
>>> print(f'{isodd(123) = }')
isodd(123) = True

See docs.python: Format String Syntax, docs.python: Formatted string literals and fstring.help for documentation and examples.

Video demo:

See my 100 Page Python Intro ebook for a short, introductory guide for the Python programming language.

Vim tip 1: increment/decrement numbers

2021-12-08T00:00:00+00:00

Did you know that you can easily increment or decrement a number in Vim?

Ctrl+a will increment the number under the cursor or the first occurrence of a number to the right of the cursor
Ctrl+x will decrement the number under the cursor or the first occurrence of a number to the right of the cursor

You can also provide a count prefix:

3 followed by Ctrl+a will add 3
1000 followed by Ctrl+x will subtract 1000

Numbers prefixed with 0, 0x and 0b will be treated as octal, hexadecimal and binary respectively. You can also use uppercase for x and b. What if you want numbers prefixed with 0 to be treated as decimal? You can use the nrformats setting as shown below:

set nrformats-=octal

Decimal numbers prefixed with - will be treated as negative numbers. For example, using Ctrl+a on -100 will give you -99. While this is handy, this trips me up often when dealing with date formats like 2021-12-08.

See :h ctrl-a, :h ctrl-x and :h nrformats for documentation.

Video demo:

Improve your Python regex skills with 75 interactive exercises

2021-12-01T00:00:00+00:00

(2023-03-20) Update: This TUI app covers many more exercises compared to the GUI app discussed below.

Still confused about Python regular expressions? Grow your confidence with Understanding Python re(gex)? ebook (FREE this month!) and an interactive GUI app.

Inspired by Advent of Code, I'll also be posting 3 challenges per day on twitter for 25 days.

Free ebook🔗

My post about the interactive GUI app made it to the Hacker News front page. To celebrate, you can get PDF/EPUB versions of my Understanding Python re(gex)? ebook for free using either of the below links. The offer is valid till 31-Dec-2021.

Or, you can use the web version if you prefer reading the book online.

Interactive GUI app🔗

Based on the Understanding Python re(gex)? book contents as well as the exercises, I made an interactive GUI app with 75 questions on re.search, re.sub, re.split and re.findall that'll test your understanding of anchors, alternation, grouping, escaping metacharacters, dot metacharacter, quantifiers, character class, grouping, lookarounds, flags, etc.

Here's some screenshots:

And here's a brief demo:

25 Days Of Regex🔗

If 75 exercises seem daunting to you, consider doing 3 exercises per day. Allocate some time everyday to read the book and complete 3 challenges.

I'd also be posting 3 challenges per day on twitter, where you'll be able to get help from me and fellow programmers.

Happy learning :)

CLI tip 2: counting number of matches

2021-11-30T00:00:00+00:00

Use grep -c to count the number of input lines containing a given pattern.

# number of input lines containing 'a'
$ printf 'goal\nrate\neat\npit' | grep -c 'a'
3

# number of input lines containing all the vowels
$ grep -icP '^(?=.*a)(?=.*e)(?=.*i)(?=.*o).*u' /usr/share/dict/words
640

# number of input lines NOT containing 'at'
$ printf 'goal\nrate\neat\npit' | grep -vc 'at'
2

With multiple file input, count is displayed for each file separately. Use cat if you need a combined count.

# separate count for each input file
$ grep -c 'a' names.txt purchases.txt 
names.txt:2
purchases.txt:6

# total count for all the input files
$ cat names.txt purchases.txt | grep -c 'a'
8

If total number of matches is required, use the -o option to display only the matching portions (one per line) and then use wc to get the count.

# -c gives count of matching lines only
$ printf 'goal\nrate\neat\npit' | grep -c '[aeiou]'
4

# use -o to get each match on a separate line
$ printf 'goal\nrate\neat\npit' | grep -o '[aeiou]' | wc -l
7

Note that if you use ripgrep, you can simply use -co or --count-matches instead of piping to the wc command.
# this behavior is different compared to GNU grep
$ printf 'goal\nrate\neat\npit' | rg -co '[aeiou]'
7

Video demo:

See my CLI text processing with GNU grep and ripgrep ebook if you are interested in learning about GNU grep and ripgrep commands in more detail.

Programming deals

2021-11-26T00:00:00+00:00

Hello!

Here's some exciting programming deals for my own ebooks as well sales details from other creators.

Offers for my ebooks (valid till Nov 30)

Practice Python Projects — FREE (normal price $10)
Learn by example Python bundle — $2 (normal price $12)
All 11 Books Bundle — $5 (normal price $22)
Giveaway contest on twitter — a chance to get a single ebook for free

Indie creators

Python Problem-Solving Bootcamp — 40% OFF today, 30% OFF tomorrow and so on (boost your Python problem-solving skills)
Python books by Michael Driscoll — $10 OFF on any book using coupon code "black21" (Python 101/201, Image/PDF/Excel processing, etc)
- see also author's blog post for links to other Python sales
Python Morsels — 50% OFF until Nov 30 (skill-building service with short videos and hands-on bite-sized Python exercises)

Miscellaneous

The Leanpub Monthly Sale for November 2021
Humble Book Bundle: Code Like a Pro by Manning Publications — various sale options starting from $1
Various programming deals discussion on Hacker News
DevUtils.app — 30% OFF this week (Powerful developer tools for your everyday tasks, Native macOS app)
How to Journal to Live your Best Life? — $1.99 for 30 customers, $3.99 for the next 30 customers and so on (not strictly related to programming, applicable for life events, career, etc)

Happy learning :)

Numeric Palindrome

2021-11-25T00:00:00+00:00

I posted a coding challenge in the second issue of learnbyexample weekly. I discuss the problem and Python/Perl solutions in this blog post.

Problem statement🔗

Find numbers from 1 to 10000 (inclusive) which reads the same in reversed form in both binary and decimal formats. For example, 33 in decimal is 100001 in binary and both of these are palindromic.

Python solution🔗

Here's one possible solution for this problem:

for n in range(1, 10001):
    dec_s = f'{n}'
    bin_s = f'{n:b}'
    if dec_s == dec_s[::-1] and bin_s == bin_s[::-1]:
        print(n)

Extending the above solution to include more comparisons is easy with built-in features:

for n in range(1, 10001):
    dec_s = f'{n}'
    bin_s = f'{n:b}'
    oct_s = f'{n:o}'
    if all(s == s[::-1] for s in (dec_s, bin_s, oct_s)):
        print(n)

As an exercise, extend this program further to include hexadecimal number comparison as well. Can you find out what's the first number that is greater than ten to satisfy all the four numeric formats?

Perl one-liner🔗

Here's a solution for CLI enthusiasts:

$ perl -le 'for (1..10000) { $bn = sprintf("%b", $_);
                print if ($_ eq reverse) && ($bn eq reverse $bn) }'
1
3
5
7
9
33
99
313
585
717
7447
9009

See my Perl One-Liners Guide ebook if you are interested in learning to use Perl from the command-line.

Python tip 2: membership operator

2021-11-25T00:00:00+00:00

The in membership operator checks if a given value is part of a collection of values. Here's an example with range() function:

>>> num = 5

# checks if num is present among the integers 3 or 4 or 5
>>> num in range(3, 6)
True

Instead of a series of == comparisons combined with the or boolean operator, you can utilize the in operator.

>>> pet = 'cat'

# instead of doing this
>>> pet == 'bat' or pet == 'cat' or pet == 'dog'
True

# use the membership operator
>>> pet in ('bat', 'cat', 'dog')
True

When applied to strings, the in operator performs substring comparison.

>>> fruit = 'mango'

>>> 'an' in fruit
True
>>> 'at' in fruit
False

To invert the membership test, use the not in operator.

>>> pet = 'parrot'

>>> pet in ('bat', 'cat', 'dog')
False
>>> pet not in ('bat', 'cat', 'dog')
True

Video demo:

See docs.python: Membership test operations for documentation. See also my 100 Page Python Intro ebook.

CLI tip 1: remove metadata from images

2021-11-18T00:00:00+00:00

Want to remove metadata (DateTime, Model, Orientation, ShutterSpeedValue, etc) from your images? You can use mogrify or convert tools provided by ImageMagick.

GUI image viewer applications will usually allow you to see some of the image metadata. You can also use the identify command line tool to get all the metadata:

# remove 'head' to get the entire list of metadata
$ identify -verbose insect.jpg | grep 'exif' | head
    exif:ApertureValue: 113/32
    exif:ColorSpace: 1
    exif:ComponentsConfiguration: 1, 2, 3, 0
    exif:CompressedBitsPerPixel: 3/1
    exif:CustomRendered: 0
    exif:DateTime: 2016:12:03 11:26:10
    exif:DateTimeDigitized: 2016:12:03 11:26:10
    exif:DateTimeOriginal: 2016:12:03 11:26:10
    exif:DigitalZoomRatio: 4000/4000
    exif:ExifOffset: 240

And here's how you can remove such metadata from images:

# to create a new image with metadata removed
$ convert -strip insect.jpg op.jpg

# to modify the input image itself
$ mogrify -strip insect.jpg

You can also pass multiple images to mogrify:

$ mogrify -strip *.jpg

Note that the image size after metadata removal may vary because of recompression.

Video demo:

Further Reading:

ImageMagick — create, edit, compose, or convert digital images
How can I read and remove meta (exif) data from my photos using the command line?
How to strip metadata from image files
What does mogrify mean?
wikipedia: Exif

Python tip 1: tuple argument for startswith/endswith methods

2021-11-16T00:00:00+00:00

You'd probably know about the startswith() and endswith() string methods.

>>> sentence = 'This is a sample string'

>>> sentence.startswith('This')
True
>>> sentence.startswith('is')
False

>>> sentence.endswith('ing')
True
>>> sentence.endswith('ly')
False

But did you know that you can also pass a tuple of strings?

>>> words = ['refuse', 'impossible', 'fire', 'present', 'read', 'shim']
>>> prefix = ('im', 're', 'use')

>>> [w for w in words if w.startswith(prefix)]
['refuse', 'impossible', 'read']

>>> [w for w in words if w.endswith(prefix)]
['refuse', 'fire', 'shim']

Video demo:

See also my 100 Page Python Intro ebook.

Announcing learnbyexample weekly newsletter

2021-11-13T00:00:00+00:00

Hello!

I'm excited to announce learnbyexample weekly newsletter, scheduled to be delivered every Friday.

This free newsletter will help you discover awesome programming resources. I'll primarily focus on resources related to Python, Linux, CLI tools, Regular Expressions and Vim. Sometimes, I'll also include other programming resources.

You can expect 5-15 links, usually categorized into the following sections:

Article of the week
Resources
Free programming books, courses and deals
Tip of the week
Tools
Curiosity Corner

Here are some of the resource links from the first issue:

How To Learn Stuff Quickly by Josh W. Comeau
Python pathlib Cookbook — 57+ Examples to Master It
Complete Guide to CSS Flex and Grid by Shruti Balasa (25% OFF for a week)
Carbon — Create and share beautiful images of your source code

After subscribing to learnbyexample weekly, you'll get a confirmation email followed by another email with the latest issue contents. You can also view the past issues from your Gumroad account.

Hope you find the newsletter useful. Let me know your feedback via email ([email protected]) or twitter.

Happy learning :)

The Indie Python Extravaganza

2021-10-01T00:00:00+00:00

Hello!

You never know where a conversation between indie authors will lead to. A tweet about Leanpub Python book sales brought up giveaways that we indie authors tend to do. Long story short, the four of us ended up deciding to create The Indie Python Extravaganza bundle.

And guess what?! You can use this pytober coupon link to get the bundle for FREE (the offer is valid till 31-Oct-2021).

Bundle contents🔗

A collection of books that will help you to improve your knowledge of the Python programming language one page at a time. Join four indie authors in a journey from the basics of Python to the structure of production-ready systems, going through the core features of the language, some intermediate projects and a deep dive into regular expressions.

Coupon link

In this bundle, Mike will teach you the basics of Python with Python 101. Sundeep will then take the lead and help you to put your knowledge into practice with Practice Python Projects. Learn what NOT to do when writing your Python programs with Rodrigo in his Pydon'ts book! If you need to learn regular expressions, Sundeep has again your back with his Python re(gex)? book, and when you are ready to start working on production code, you'll have Clean Architectures in Python to help you!

Authors🔗

Leonardo Giordani: Blog, Twitter
Michael Driscoll: Blog, Twitter
Rodrigo Girão Serrão: Blog, Twitter
Sundeep Agarwal: Blog, Twitter

How can you help?🔗

Share the bundle link with your friends and colleagues interested in learning Python.

Your feedback on the book contents would be appreciated even more.

Happy learning 😇

Practice Python Projects book announcement

2021-07-30T00:00:00+00:00

Hello!

My "Practice Python Projects" ebook presents five beginner-to-intermediate level projects inspired by real world use cases:

To test your understanding and to make it more interesting, you'll also be presented with exercises at the end of each project. Resources for further exploration are also mentioned throughout the book.

Ebook links🔗

You can buy the PDF/EPUB versions of the ebook using these links:

You can also get them as part of the Learn by example Python bundle using these links:

Videos🔗

Check out my programming tips covering Python, command line tools and Vim:

Testimonials🔗

Your Practice Python Projects book is really helping me to reinforce my knowledge and mastery of Python as I'm learning.

— feedback on twitter

Web version🔗

You can also read the book online here: https://learnbyexample.github.io/practice_python_projects/preface.html.

GitHub repo🔗

Visit https://github.com/learnbyexample/practice_python_projects for programs, example files, markdown source and other details about the book.

See also my blog post on how to customize pandoc for generating beautiful PDF/EPUB versions from GitHub style markdown.

Feedback🔗

You can reach me via:

Issue Manager: https://github.com/learnbyexample/practice_python_projects/issues
E-mail: echo 'bGVhcm5ieWV4YW1wbGUubmV0QGdtYWlsLmNvbQo=' | base64 --decode
Twitter: https://twitter.com/learn_byexample

Happy learning :)

Escaping madness to get literal field separators in awk

2021-07-02T00:00:00+00:00

I'm building a tool called rcut that allows you to use cut like syntax with features like regexp based delimiters. The solution uses awk inside a bash script.

Latest feature creep is fixed string field splitting. I thought it would be a simple enough solution to add.

I was wrong.

How many escapes for a single backslash?🔗

For reference, these are the versions I have on my machine:

$ gawk --version
GNU Awk 5.1.0, API: 3.0

$ mawk -W version
mawk 1.3.4 20200120

mawk and gawk differ when it comes to escaping backslashes. You'll later see the rule that'll work correctly for both implementations.

$ echo 'apple\bake\cake' | mawk -F'e\' '{print $2}'
bak

$ echo 'apple\bake\cake' | gawk -F'e\' '{print $2}'
gawk: fatal: invalid regexp: Trailing backslash: /e\/
$ echo 'apple\bake\cake' | gawk -F'e\\' '{print $2}'
gawk: fatal: invalid regexp: Trailing backslash: /e\/
$ echo 'apple\bake\cake' | gawk -F'e\\\' '{print $2}'
bak

The value assigned to FS is treated as a string and then converted to a regexp. \ is a metacharacter for string and regexp both. So, \\ in a string means a single backslash and \\\\ means double backslash. Double backslash in regexp means a single backslash.

Conclusion: For a consistent behavior across both mawk and gawk and irrespective of trailing backslash errors, you need to use 4 backslashes for every backslash.

# both 2 and 4 backslashes here gets treated as single backslash
# hence the empty fields in the output
$ echo '1\\2\\3' | mawk -F'\\' -v OFS=, '{$1=$1} 1'
1,,2,,3
$ echo '1\\2\\3' | mawk -F'\\\\' -v OFS=, '{$1=$1} 1'
1,,2,,3
$ echo '1\\2\\3' | gawk -F'\\' -v OFS=, '{$1=$1} 1'
1,,2,,3
$ echo '1\\2\\3' | gawk -F'\\\\' -v OFS=, '{$1=$1} 1'
1,,2,,3

# 5-8 backslashes give expected results
$ echo '1\\2\\3' | mawk -F'\\\\\' -v OFS=, '{$1=$1} 1'
1,2,3
$ echo '1\\2\\3' | mawk -F'\\\\\\' -v OFS=, '{$1=$1} 1'
1,2,3
$ echo '1\\2\\3' | mawk -F'\\\\\\\' -v OFS=, '{$1=$1} 1'
1,2,3
$ echo '1\\2\\3' | mawk -F'\\\\\\\\' -v OFS=, '{$1=$1} 1'
1,2,3

# 5-6 backslashes give error, 7-8 backslashes give expected results
$ echo '1\\2\\3' | gawk -F'\\\\\' -v OFS=, '{$1=$1} 1'
gawk: fatal: invalid regexp: Trailing backslash: /\\\/
$ echo '1\\2\\3' | gawk -F'\\\\\\' -v OFS=, '{$1=$1} 1'
gawk: fatal: invalid regexp: Trailing backslash: /\\\/
$ echo '1\\2\\3' | gawk -F'\\\\\\\' -v OFS=, '{$1=$1} 1'
1,2,3
$ echo '1\\2\\3' | gawk -F'\\\\\\\\' -v OFS=, '{$1=$1} 1'
1,2,3

As an alternate method, you can use codepoint of the backslash character. This removes one level of escaping. See ASCII code table for codepoint reference.

Conclusion: You need \x5c\x5c for every backslash.

$ echo 'apple\bake\cake' | mawk -F'e\x5c\x5c' '{print $2}'
bak
$ echo 'apple\bake\cake' | gawk -F'e\x5c\x5c' '{print $2}'
bak

$ echo '1\\2\\3' | mawk -F'\x5c\x5c\x5c\x5c' -v OFS=, '{$1=$1} 1'
1,2,3
$ echo '1\\2\\3' | gawk -F'\x5c\x5c\x5c\x5c' -v OFS=, '{$1=$1} 1'
1,2,3

Using awk to generate an escaped string🔗

Suppose you want to use \. literally for field splitting. Here's some ways to do it that works for both mawk and gawk:

$ echo 'x\2\.y\.z' | gawk -F'\\\\\\.' -v OFS=, '{$1=$1} 1'
x\2,y,z
$ echo 'x\2\.y\.z' | gawk -F'\\\\[.]' -v OFS=, '{$1=$1} 1'
x\2,y,z
$ echo 'x\2\.y\.z' | gawk -F'\x5c\x5c[.]' -v OFS=, '{$1=$1} 1'
x\2,y,z

Now, the task is to generate one of the above strings passed to the -F option from \. as input. Using sed is better, but for rcut, I didn't want to add another external tool.

Case 1: backslash madness🔗

You need to convert \ to 4 backslashes and escape regexp metacharacters with 2 backslashes. Note that you cannot escape all characters except \ with 2 backslashes, for example \\t will become a tab character! Also, you need to escape \ first and then escape the other metacharacters.

Ready for the solution? I'm not even going to try explaining this, found it by experimenting.

# replacement string for the first gsub has 16 backslashes
# replacement string for the second gsub has 8 backslashes
$ echo 'a.b\c^d' | gawk '{gsub(/\\/, "\\\\\\\\\\\\\\\\");
                          gsub(/[{[(^$*?+.|]/, "\\\\\\\\&")} 1'
a\\.b\\\\c\\^d

gawk manual: Gory details might help you understand the above solution.

Case 2: character class🔗

One of the characteristic of character class is that you can enclose all characters except \ and ^ to match them literally. The \ character is special both inside/outside of character class and [^] is invalid since ^ is special if used as the first character.

$ echo 'a.b\c^d' | gawk '{gsub(/\\/, "\\\\\\\\\\\\\\\\");
                          gsub(/[^^\\]/, "[&]");
                          gsub(/\^/, "\\\\^")} 1'
[a][.][b]\\\\[c]\\^[d]

Case 3: codepoint to represent backslash🔗

Finally, my preferred solutions that uses codepoint instead of escaping backslashes.

# case 1 alternate
$ echo 'a.b\c^d' | gawk '{gsub(/\\/, "\\x5c\\x5c");
                          gsub(/[{[(^$*?+.|]/, "\\x5c&")} 1'
a\x5c.b\x5c\x5cc\x5c^d

# case 2 alternate
$ echo 'a.b\c^d' | gawk '{gsub(/[^^\\]/, "[&]");
                          gsub(/\\/, "\\x5c\\x5c");
                          gsub(/\^/, "\\x5c^")} 1'
[a][.][b]\x5c\x5c[c]\x5c^[d]

Sanity check🔗

I probably lost my sanity trying to come up with a solution and again while writing this post. I did try a few sanity checks for the solutions presented here, but there's a chance I messed up or missed some corner case. If you spot an issue, do let me know.

Debug woes 2: unexpected array in replacement string

2021-06-17T00:00:00+00:00

So, I was editing a markdown file in Vim and I wanted to convert some lines to links. The regexp pattern ended up needing non-greedy quantifier, but it didn't work. I thought I got Vim's rather weird \{-} syntax wrong and switched to using Perl from the command line instead of checking the documentation if I had actually made that mistake.

Turns out I made other mistakes in the regexp, but I didn't want to switch back to Vim. I was still scratching my head though, since I wasn't getting the expected output. Thankfully, compared to the previous debug misery, I was able to guess this issue soon enough.

Here's a simplified issue, how I debugged it and the corrected usage:

# sample input
$ cat ip.txt 
* blah blah 12 xyz 34 abcd 56 foobaz
- blah 100 apple 200 fig

# where I got stuck
# what happened to $1 and $2?
$ perl -lpe 's/^(. )(.*?\d+) (.+)/$1[$2](#$3)/' ip.txt
(#xyz 34 abcd 56 foobaz)
(#apple 200 fig)

# what I did to debug
# step 1: only $1 in the replacement
$ perl -lpe 's/^(. )(.*?\d+) (.+)/$1/' ip.txt
* 
- 
# step 2: $1 and $2 in the replacement
# only empty lines as output - bingo!
$ perl -lpe 's/^(. )(.*?\d+) (.+)/$1[$2]/' ip.txt


# $1[$2] treated as array syntax in Perl
# so, need to escape [ since array isn't intended here
$ perl -lpe 's/^(. )(.*?\d+) (.+)/$1\[$2](#$3)/' ip.txt
* [blah blah 12](#xyz 34 abcd 56 foobaz)
- [blah 100](#apple 200 fig)

Dreaming solutions

2021-06-10T00:00:00+00:00

This SO question was interesting and had various approaches to solve it. Here's a sample example to explain the problem to be solved:

$ cat ip.txt
caller_number=034082394234324, clear_number=33335345435,  direction=1,
caller_number=83479234234,     clear_number=34836424733, direction=2,
caller_number=83479234234,     clear_number=64237384533, direction=2,

$ cat list.txt
642
3333
534234235

$ cat op.txt
caller_number=83479234234,     clear_number=64237384533, direction=2,

Any data present in list.txt has to be matched immediately after clear_number= and the input line should also have direction=2,. In the sample above, first line matches 3333 but not the second criteria. The second line fails even though it has 642 since it is not immediately after clear_number=. The list.txt file can have 10K-50K lines and ip.txt is around 10GB.

Here's a slightly modified answer based on existing solutions on that thread. Since the data present in list.txt has to be partially matched after clear_number=, a single direct comparison with the keys saved in arr is not possible. This solution loops over all the keys for every input line that matches the direction=2, criteria (breaks the loop if a match is found early).

FNR==NR{ arr["=" $0]; next }

$3=="direction=2,"{
    for(i in arr)
        if(index($2,i)){
            print
            next
        }
}

To run the solutions, use mawk -f script.awk list.txt ip.txt

In my dreams that night, I realized that the solution can be improved drastically by looping over the digits after clear_number= instead of looping over keys saved in arr. Matching a key is O(1), so the time saving is huge since the inner loop is now a maximum of 12 (length of digits after clear_number=) instead of looping a maximum of 10K-50K times! With a 35M sample input file and 12K keys that I created for testing, I found this solution to be about 200 times faster.

FNR==NR{ arr[$0]; next }

$3=="direction=2,"{
    val=substr($2,14)
    for(i=1; i<length(val); i++)
        if(substr(val,1,i) in arr){
            print
            next
        }
}

GNU BRE/ERE cheatsheet and differences between grep, sed and awk

2021-05-31T00:00:00+00:00

Poster created using Canva

This post covers Basic Regular Expressions (BRE) and Extended Regular Expressions (ERE) syntax supported by GNU grep, sed and awk. You'll also learn the differences between these tools — for example, awk doesn't support backreferences within regexp definition (i.e. the search portion).

BRE and ERE🔗

From GNU grep manual:

In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, $, and $.

grep and sed support BRE by default and enables ERE when -E option is used. awk supports only ERE. Assume ERE for descriptions in this post unless otherwise mentioned.

This post is intended as a reference for BRE/ERE flavor of regular expressions. For a more detailed explanation with examples and exercises, see these chapters from my ebooks:

Anchors🔗

Pattern	Description
`^`	restricts the match to the start of the string
`$`	restricts the match to the end of the string
`\<`	restricts the match to the start of word
`\>`	restricts the match to the end of word

The -x cli option in grep is equivalent to ^pattern$.

Word characters include alphabets, digits and underscore. Here's some more alternate ways to specify word anchors:

Pattern	Description
`\b`	restricts the match to the start/end of words, applicable for `grep` and `sed`
`\y`	restricts the match to the start/end of words, applicable for `awk` (`\b` means backspace)
`\B`	matches wherever `\b` (or `\y`) doesn't match

grep also supports -w cli option. It is equivalent to (?<!\w)pattern(?!\w). The three different ways to specify word anchors are not exactly equivalent though, see Word boundary differences section from my book for details and examples.

Alternation and Grouping🔗

Pattern	Description
`pat1\|pat2\|pat3`	match `pat1` or `pat2` or `pat3`
	use `\\|` in BRE mode
`()`	group pattern(s), `a(b\|c)d` is same as `abd\|acd`
	use `` in BRE mode

The alternative patterns can have their own independent anchors. Alternative which matches earliest in the input gets precedence. Longest matching portion wins if multiple alternatives start from the same location (irrespective of the order of alternatives). In case of a tie with same lengths, leftmost alternative wins (see stackoverflow: Non greedy matching in sed for a practical use case).

Escaping metacharacters🔗

Pattern	Description
`\`	prefix metacharacters with `\` to match them literally
`\\`	to match `\` literally

With grep and sed, switching between ERE and BRE can reduce the number of escapes needed for some cases. For fixed string matching, grep has -F option and awk has string comparison operators (whole string) and the index function (partial string).
sed requires both ( and ) characters to be escaped (in ERE mode), whereas grep and awk don't require ) to be escaped.
sed requires { to be escaped (in ERE mode) even if it isn't part of a valid quantifier syntax, whereas grep and awk don't require escaping. For example, you'd need \{a} in sed whereas {a} is enough for the other two.
In BRE mode, grep and sed don't require ^ and $ to be escaped if they are used away from their customary positions.

Dot metacharacter and Quantifiers🔗

Pattern	Description
`.`	match any character, including the newline character
`?`	match `0` or `1` times
	use `\?` in BRE mode
`*`	match `0` or more times
`+`	match `1` or more times
	use `\+` in BRE mode
`{m,n}`	match `m` to `n` times
`{m,}`	match at least `m` times
`{,n}`	match up to `n` times (including `0` times)
`{n}`	match exactly `n` times
	use `\{\}` in BRE mode
`pat1.*pat2`	any number of characters between `pat1` and `pat2`
`pat1.pat2\|pat2.pat1`	match both `pat1` and `pat2` in any order

Precedence rule is longest match wins, which is mostly similar but not exactly same as greedy quantifiers. For example, with foo123312baz as input string, o[123]+(12baz)? will match o123312baz with these tools, whereas it will match o123312 with greedy quantifiers.

Character class🔗

Pattern	Description
`[set123]`	match any of these characters once
`[^set123]`	match except any of these characters once
`[3-7AM-X]`	range of characters from `3` to `7`, `A`, another range from `M` to `X`
`[.`	open collating symbol
`.]`	close collating symbol
`[=`	open equivalence class
`=]`	close equivalence class

Specific placement will help to match character class metacharacters literally.

Pattern	Description
`[a-z-]`	`-` should be first/last character to match literally
`[+^]`	`^` shouldn't be first character
`[]=]`	`]` should be first character (second if `^` is used to invert the set)

\ isn't special within character class in grep.
\ can be used to escape character class metacharacters in awk.

Some commonly used character sets have predefined escape sequences:

Pattern	Description
`\w`	similar to `[a-zA-Z0-9_]` for matching word characters
`\s`	similar to `[ \t\n\r\f\v]` for matching whitespace characters
`\W`	match non-word characters
`\S`	match non-whitespace characters

Undefined escape sequences will be treated as the character it escapes. For example, \e will match e (not \ and e).
- in addition, awk gives a "not a known regexp operator" warning.
The above escape sequences cannot be used inside character classes and behavior varies between the tools.
- For example, using [\w] will match \ or w characters in grep and sed whereas it will match only w in awk.
These escape sequences are also locale aware, for example αλεπού and \u2028 (line separator) will be considered as word and whitespace characters respectively in appropriate locales.
These tools do not support \d and \D, commonly featured in other regexp implementations for digits and non-digits.

Escape sequences🔗

This section is applicable only for sed and awk unless otherwise specified and can be used within character classes too. See also ASCII Codes Table Standard characters.

Escape sequence	Description
`\a`	alert
`\b`	backspace in `awk`, word boundary in `grep` and `sed`
	`\b` inside a character class in `sed` will act as a backspace
`\f`	formfeed
`\n`	newline
`\r`	carriage return
`\t`	horizontal tab
`\v`	vertical tab
`\cx`	CONTROL-x in `sed`

You can also represent ASCII characters using their codepoint values.

Escape sequence	Description
`\xNN`	hexadecimal digits
`\NNN`	octal digits in `awk`
`\oNNN`	octal digits in `sed`
`\dNNN`	decimal digits in `sed`

In search section, a metacharacter specified by escape sequences will still act as the metacharacter. For example, /\x5eco/ will match co only at the start of the string.
In replacement section,
- escape sequences in sed produces literal character. For example, s/.*/"\x26"/ will have "&" as the replacement value.
- escape sequences in awk is treated as metacharacter. For example, sub(/.*/, "[&]") and sub(/.*/, "[\x26]") are equivalent.

Ways to use escape sequences with grep:

ANSI-C Quoting — for example, $'a\tb' will match a and b with a tab in between.
-P option, see my chapter on Perl Compatible Regular Expressions for more details.

Named character sets🔗

The below table lists named sets and their equivalent character class in ASCII encoding. These can be used inside character classes only. For example, [[:digit:]] is same as [0-9] and [[:alnum:]_] is equivalent to \w.

Named set	Description
`[:digit:]`	`[0-9]`
`[:lower:]`	`[a-z]`
`[:upper:]`	`[A-Z]`
`[:alpha:]`	`[a-zA-Z]`
`[:alnum:]`	`[0-9a-zA-Z]`
`[:xdigit:]`	`[0-9a-fA-F]`
`[:cntrl:]`	control characters — first 32 ASCII characters and 127th (DEL)
`[:punct:]`	all the punctuation characters
`[:graph:]`	`[:alnum:]` and `[:punct:]`
`[:print:]`	`[:alnum:]`, `[:punct:]` and space
`[:blank:]`	space and tab characters
`[:space:]`	whitespace characters, same as `\s`

From grep manual:

Their interpretation depends on the LC_CTYPE locale; for example, [[:alnum:]] means the character class of numbers and letters in the current locale.

Backreferences🔗

Pattern	Description
`\N`	backreference, gives matched portion of Nth capture group
	possible values: `\1`, `\2` up to `\9`
`&`	represents entire matched string in the replacement section
`\0`	equivalent to `&` in `sed`

Notes for awk:

backreferences can be used only in replacement section, not allowed in search section.
sub and gsub functions allow only the & backreference.
gensub function allows \N form of backreference as well.
- but need to use \\0, \\1, \\2 etc since they are specified using string syntax.

sed flags🔗

This section discusses flags (also known as modifiers) that change the regexp behavior. When used with regexp addressing:

Flag	Description
`I`	match case insensitively

When used with substitution command:

Flag	Description
`i` or `I`	match case insensitively
`g`	replace all occurrences instead of just the first match
`N`	a number will cause only the Nth match to be replaced
`Ng`	replace from Nth match to the end
`m` or `M`	multiline mode
	`.` will not match the newline character
	`^` and `$` will match every line's start and end locations (line separator is `\n` by default and NUL when `-z` option is used)
\`	always match the start of string irrespective of multiline mode
`\'`	always match the end of string irrespective of multiline mode

Flags are not supported by grep or awk. But these equivalent/alternative options can be used:

-i cli option in grep and setting IGNORECASE to non-zero value in awk will match case insensitively.
tolower or toupper functions can be used in awk to convert input to single case.
you can also use character classes for small strings, for example [cC][aA][tT] will match cat case insensitively.
sub function in awk replaces only the first matching occurrence and gsub function is equivalent to using the g flag.
third argument of gensub function in awk supports replacing only the Nth match as well as the g flag.

The behavior of sed and awk differs for Nth match if the pattern can match empty string:

$ echo 'a,,c,d,,f' | sed 's/[^,]*/b/2'
a,b,c,d,,f
$ echo 'a,,c,d,,f' | sed 's/[^,]*/e/5'
a,,c,d,e,f

$ echo 'a,,c,d,,f' | awk '{print gensub(/[^,]*/, "b", 2)}'
ab,,c,d,,f
$ echo 'a,,c,d,,f' | awk '{print gensub(/[^,]*/, "e", 5)}'
a,,ce,d,,f

sed case conversion🔗

Escape sequence	Description
`\E`	indicates end of case conversion in replacement section
`\l`	convert next character to lowercase
`\u`	convert next character to uppercase
`\L`	convert following characters to lowercase, stops if `\U` or `\E` is found
`\U`	convert following characters to uppercase, stops if `\L` or `\E` is found

sed delimiters🔗

/ is idiomatically used as the delimiter.
Any character except \ and newline character can also be used. For example: s#/home/learnbyexample/#~/# is same as s/\/home\/learnbyexample\//~\//.
For regexp addressing, the first delimiter has to be escaped. For example: \;/foo/bar/;p is same as /foo\/bar\//p.

Debug woes 1: multiple substitutions on the same line

2021-05-29T00:00:00+00:00

While answering this SO question, I ran into a debug misery. It took me an embarrassing amount of time and experiments to understand why.

Here's a simplified version of the problem. Can you spot the issue?

$ cat ip.txt
a b c d
i j k l

# Change only first two occurrences of spaces with tabs
$ perl -pe '$c=2; s/ /\t/ while $c--' ip.txt
a       b       c d
i       j       k l

# Wanted to generalize the solution to match one-or-more whitespaces
# But it doesn't work!!!
$ perl -pe '$c=2; s/\s+/\t/ while $c--' ip.txt
a       b c d
i       j k l

Click to view answer

The substitution works from start of the line for every iteration of the while loop. Tab is one of the whitespace characters, so after the first substitution, the tab gets matched for rest of the iterations.

Perl one-liner articles

2021-05-26T00:00:00+00:00

One of the feedback I got for my Perl one-liners ebook was to showcase examples where Perl shines compared to other text processing tools.

Soon after, I got an invite to publish an article for the perldotcom site. So, I wrote a two-part post highlighting use cases where Perl's rich regular expression engine, built-in functions, extensive ecosystem and portability helps:

Thanks to the editors for suggestions and improvements.

Paying my bills with 'free' ebooks

2021-03-03T00:00:00+00:00

TL;DR: Small victories are more precious when you have nothing. Instead of burning through my savings, I'm now adding to it. The relief is priceless.

It is worth it (for me)🔗

The section title is my response to this article Writing a book: is it worth it? that I saw on Hacker News.

For my unique circumstances, the decision to write ebooks has brought me financial stability, improved my mental health and gives me a sense of satisfaction. This could've come from any of my previous attempts to earn money, but ebooks is what worked out for me.

Photo by Bram Naus on Unsplash

How it all started?🔗

I left my job in 2014 for various reasons. I didn't have any plans for the future, just knew that I couldn't work as an employee any more.

After enjoying my break, I had to try something to start earning again. I wrote an android gaming app, fantasized earning loads of money with an awesome work planner/communicator software that never left my imaginations, tried a small stint with a team making an educational app, etc. I failed due to various reasons — didn't try hard enough, quit early, didn't fit my skills, wasn't good at design/marketing and so on. The educational app for example went on to become a success. Or perhaps, having saved enough to live out a few years without working meant I wasn't under enough pressure to earn.

Among these failures, college workshops was the sole bread giver (and long way from supporting my modest living costs). My bachelor's degree was in electronics and communications and I had worked in a semiconductor company. So I knew enough to teach students pursuing similar courses the basics for Linux command line, Vim, Perl, Bash scripting, etc. As reference materials, I used to provide ppt slides (when I still had a job). Now that I had loads of free time, I started expanding my knowledge. Came to know about sites like Stackoverflow/Stackexchange/Reddit/etc. With newer and better materials to learn from, I created PDFs (using LibreOffice, which was pretty much the only option I knew about).

Another loss maker was getting a domain/host to share these learning materials. Web development was too much for me and the (ugly) site didn't get any love. In hindsight, one of the better turning points was learning about GitHub in 2016. I loved markdown's nice output with syntax highlighting (and realized I was using it poorly in Reddit) and GitHub's social aspect (stars, issues, etc) — plus I can use Vim! I manually converted my materials from LibreOffice to markdown (again, I didn't know that tools like pandoc could've helped me). Just like any other skill, I was learning and getting better with every iteration. That was the year I learned Python (thanks to Al Sweigart's free coupon for "Automate the Boring Stuff with Python" video course) and started conducting workshops for Python instead of Perl.

Being active on Stackoverflow and Reddit, I finally became proficient at CLI one-liners (late by 8 years, since it would have significantly helped in my role as a design and test engineer). I came across articles/books on regular expressions and one-liners. I thought — I can do that too, plus I was really liking them. Thus began my epic Command Line Text Processing repo, another big turning point in my journey as an author.

Encouraging signs🔗

Over the course of ten months, I managed to complete the holy trinity of grep, sed and awk one-liners. I promoted these tutorials on Reddit, Google+, LinkedIn and other social sites I knew at that time. The repo got hundreds of stars and more importantly, I got critical feedback. I was ecstatic, even if I was continuing to burn through my savings.

Then, I got to know about Hacker News (I think it was someone bragging about reaching front page). It took me a while to get used to Reddit, and HN was similarly alien to me. I posted a few links as a test and then I was brave enough to submit my awk one-liners post. I was refreshing HN anxiously for about half an hour or so. It got one vote and then other submissions pushed it away from new posts tab. Disappointed, I moved on. After sometime, I was checking traffic on my GitHub repo as usual, a habit I had picked up (all kinds of points, karma, likes, etc were so enticing). I noticed a HUGE spike in traffic and star count, the likes of which I had never seen before/since. The last time I had felt that proud of my work was during my job. This comment made a big impression on me:

These are the best stories on HN and why i subscribed here in the first place. I have often seen awk used so many times on SO but I've always put it up for something later to learn. Finally today I have some basic understanding of awk and this is really great stuff! I did get by with Perl but this is definitely more handy and the example approach to teaching it makes is super easy to understand!

After the euphoria had died down (about a week I guess), I was thinking about all the various kinds of posts I could make. And I was thinking how to use the repo popularity to bring in money. Long story short, I ended up adding donate buttons to my repos. This was before GitHub sponsors was announced. I wanted my materials to be freely available, so I wasn't even thinking about creating paid only options. Despite adding more tutorials, getting featured in rubyweekly and other newsletters, social sites, etc, all I got was a single recurring donation (which ended prematurely when that platform switched payment set up).

Another turning point came when a friend of mine was authoring a book and referred me for the reviewer role. Around that time, I had been converting Think Python to Think Ruby and simultaneously working on a separate Ruby tutorial. During the book review process, I was given a list of topics and asked if I was interested in writing a book (they were impressed by my existing repos). The topics were either beyond my knowledge or out of scope, and they weren't interested in the repos I had already put up.

My friends were always suggesting me to write a book and my reply consistently had been that I wasn't good enough to write one (the imposter syndrome hasn't left me even now). The book review experience, existing repos, my tryst with Think Ruby, dwindling savings, etc changed my mindset enough to try. By then I was already familiar with Leanpub, so I knew self-publishing was an option. I picked a niche topic (Ruby Regexp), learned enough pandoc to produce a PDF and published it even before the book review ended. It helped that I already had material as part of the Ruby tutorial I was working on. I still had to work a lot, since tutorial description was all bullet points.

I got only a few sales, but I had landed another review (video course for the same book) and was getting paid. So, I converted 'Ruby Regexp' to Python re(gex)?. I made it free for a few days and posted on Reddit, HN and other social sites. HN submission didn't get any traction, but fortunately Reddit submission on /r/Python/ was a big hit — thousands of free downloads and a few paid ones enough to cover 2 months of my expenses. I should mention now that I live alone, in outskirts of an Indian city, and my modest lifestyle costs about $150 per month. What works for me won't necessarily suit others.

A dip followed by sustainable momentum🔗

Encouraged by the second release, I changed my focus from updating my GitHub repos to writing books. All those repos were now a fodder for book conversion. I picked up grep first and included ripgrep as well to keep it inline with the trend. Got decent sales from free promotions. HN submission tanked at first, but got good attention when I posted again after a revision. Then I published a new version of 'Python re(gex)?' with significant changes and this HN submission got good views too. But note that these HN hits weren't anywhere close to what my awk one-liners post had received.

Writing sed took a lot out of me. Probably I was getting jaded again, juggling between workshops and ebooks. Then I had a medical issue. I didn't even try promoting the sed book on HN. I managed to learn enough JS to write JavaScript regexp. Wasn't anywhere close to what I got from the Python book.

Photo by David Traña on Unsplash

So, despite reasonable reception during free promotions, my ebooks weren't still good enough to consistently pay my bills. Combined with workshops I was just about making my ends meet. I was losing interest and the medical issue was continuing. Still, without anything else to do, I finally started a book on awk one-liners. Things started getting better for a few months and then the pandemic hit.

Given the recent medical scare, pandemic fears and the trend of giveaways, I decided to open source my book contents. And, I made all my ebooks free to download indefinitely. Made a single bundle of all the 5 books I had published until then to make it easier to download in one shot. The reception was better than expected. Shortly after (last week of March), I published the awk book early by cutting corners like excluding exercises. All books bundle now had 6 entries. Again, the reception was much better than expected. I hadn't made so many paid sales during a month ever before.

Encouraged by the success, I made another important decision. Instead of starting another book, I took up the task of updating all my books. I alloted a month or two for this task, but it took me more than 4 months in the end. It wasn't that I had lot of new features to add. The feedback I had received over the past year and my own improving writing skills meant that I just couldn't help updating the books to the best of my abilities. Somehow, lockdown and fear of the pandemic ended up improving my workflow.

Workshops weren't going to come my way anytime soon, but ebook sales for about 6 months averaged $200+ per month. For the first time since leaving my job, I was saving money!!! During this period all my books were free to download, in addition to the markdown source being available from GitHub repos. I even managed to create EPUB and web versions for my ebooks. The web version generated using mdBook was much better than my attempts with wordpress all those years ago, but to be fair I hadn't known enough about formatting for coding books.

After finishing this marathon revision task, I reverted PDF/EPUB versions to be a paid option again. Since then, I've managed to write three more books. I did Perl and Ruby one-liners (as part of the ongoing conversion of the CLI text processing repo) despite knowing sales won't be good enough to keep up the momentum. Then I wrote a Python intro book for those already familiar with programming basics. Published last month, sales are much lesser than I expected. Given Python is now 30 years old and there's no shortage of Python books for beginners, I shouldn't be surprised though. I'm probably grumpy because it took a month more than expected even though I already had decent material from my workshops. Anyway, my main motivation was to improve my Python knowledge and it did serve that purpose. As a bonus, I just got started with workshops again, conducted online (a first for me). The book is already proving useful as a handy reference for me as well as the students.

Feedback and Criticism🔗

Here's some of the feedback I've received over the past two years.

Grammatical mistakes. Missing a, an and the articles were particularly jarring for the readers. If you couldn't tell from reading this article (heh) that English isn't my native language, I'll consider that I've improved a lot.
Some readers stated that they didn't bother checking out my books because the covers are so bad. I finally got a cover done by an artist for the Python intro book.
For the regexp books, a few readers said my introductions were light on content. So during the marathon book updates I did last year, I managed to add more details. I feel there's still plenty of room for improvement.
My comprehension is kinda average and it works better whenever I manage to create code snippets to prove or disprove my understanding. So, my books are heavily example oriented. I've received feedback that there are too many examples, explanations aren't sufficient, etc. I'm trying to improve on this count, but I doubt I can change my natural writing style.
A few readers wanted more exercises, which I was happy to oblige. It took me a while to accept that I should provide solutions as well.

I did get a few negative feedback (ones I consider weren't constructive in nature). One such feedback affected me a lot, despite the encouraging sales for the second book. Over time, I've adapted but I'm still afraid of seeing one whenever I promote my books.

Self publishing experience🔗

I don't have a personal experience with traditional publishing (other than the two review opportunities). After the initial success of 'Python re(gex)?' book, I was happy to stay being self published. When there was a dip, I did consider it would be nice to have the backing of a traditional publisher and a chance to improve the contents of my books.

What I like about self published:

I can give away free copies whenever I want, change pricing, share the source code, put up free web version of the books, etc.
- I'm aware of a few publishers allowing authors to put up free web copies, but it isn't universal.
I can push updates easily and inform the readers as well.
No deadlines, other than self imposed ones. This is both good and bad. The good thing is that I can take my time. The bad thing is that the reduced pressure leads to longer schedule. I spend a lot of time on social media, reading fiction, watching entertainment, etc. The lockdown marathon did improve my average working hours, but there's still a lot of room for improvement.
I am not restricted by guidelines set by a publisher regarding chapter structure, images, exercises, etc.

What I feel would improve with traditional publishing:

Cover image
PDF/EPUB quality
Content quality, especially grammar
Audience reach

Not sure how my earnings would be affected. On the one hand, I get minimum 80% on book sales. On the other hand, I'd probably reach a wider audience with traditional publishing. I did receive a few offers when my promotional posts were trending. One of the offer (for 'Python re(gex)?' ebook) had a joining bonus and initial advance — both combined was less than what I had already earned. But if they had extended the offer for other books as well, it would've been a much more tempting deal.

Currently, I'm happy with status quo. Always free web versions and free PDF/EPUB promotional sales kinda solves my donation problem before I started selling ebooks — I get paid and readers have a way to get the materials for free. I'm also inspired by FOSS products I use and authors like Al Sweigart and Allen B. Downey who give away quality learning resources for free.

That said, I wish I could improve my marketing skills. Or, somehow someone likes my books so much that their review attracts significant attention and my sales increase as a result. I've also considered trying out affiliates, but haven't even created a list of people to contact yet. I don't have analytics set up on Leanpub, my blog, web versions of my books, etc. Based on analytics that is available by default on GitHub and Gumroad, I do see a few links from schools and universities. I wish they would contact me, so that I can help if needed and improve my book contents based on their experiences.

Leanpub vs Gumroad🔗

I started with Leanpub since I had seen a few posts from self published authors using this platform. By the time I had published the second book I got to know about Gumroad and was attracted by the pricing/payout structure. From then on, I have published on both platforms.

Here's what I like most about both these platforms:

I can change pricing (including free option) and book contents any number of times
I can allow users to pay more than the product price, which is how I get paid during free promotional sales
I can inform readers whenever I update my books
I can create bundles
They handle collection of VAT (and other such fees)
Their payout options work for me in India

Here's some differences and my opinions on some of their features:

Gumroad's pricing structure is better. If you have a following like Julia Evans, pricing would make a huge difference
Gumroad gives analytics for free
Gumroad's email notification is opt-out compared to opt-in for Leanpub. Opt-in is better for readers, but in my experience less than 10% sign up and thus miss out when I want to send them book updates
Leanpub payout delay is 45-75 days, Gumroad is 7-14 days (or instant in some cases)
Leanpub's bundle feature is better since it doesn't require a new cover and files are automatically picked based on the links provided. In Gumroad, it is essentially a new product, but it does allow to manually pick files from existing ones. Also, Leanpub allows bundling with another author (which I have used and given me decent sales)
Leanpub's product page and UI is better. The sliding scale (along with information on author's share) to pick a price is clearer than Gumroad's manual price entry. And I don't like that Gumroad places the minimum price information away from the box where the user enters a price. On Leanpub, all of these are shown together and reduces confusion
Leanpub's product page has always ranked higher in search results in my experience
Leanpub's 45-day Guarantee and Sample chapters as part of the product page makes it easier for readers to take a risk
Leanpub has weekly/monthly sale newsletters in which you could get featured. This has brought me significant earnings in the past few months. If you enable an option, Gumroad would promote your product too (for 10% extra fee) but this has given me very few sales compared to Leanpub's newsletter

Pricing UI on Leanpub

Pricing UI on Gumroad

pandoc and mdbook🔗

I picked pandoc to generate PDF from GitHub style markdown, as it seemed the most popular tool for this purpose. The default output is good enough, but I wanted to customize a lot of things. With help from documentation and various Stackoverflow/Stackexchange threads, I was able to generate an output to my liking. I didn't know about templates though, otherwise I could have researched about them and re-used solutions from others. I wrote a blog post about my learnings, visit Customizing pandoc to generate beautiful pdf and epub from markdown if you are interested.

Some readers wanted EPUB versions too. I thought it made sense for reading from mobile, but my own experience with this format on desktop was quite disappointing. Only later did I learn that I wasn't using a proper EPUB reader for technical books. Which is why I didn't realize that the default output from pandoc for EPUB was also good enough. During the revision marathon, I finally created EPUB versions too. I'd say I am still a beginner, but I did learn enough CSS and LaTeX to customize EPUB and PDF generation with pandoc.

pandoc has its own enhanced version of markdown, which has a lot of nifty features for ebooks. But I chose to stick with GitHub style markdown. And it came in handy when I wanted to re-use book material for blog posts, generating web versions of the book with mdbook and so on. After I had decided to open source my books, I also wanted to make a web version that feels like a book instead of just the single page markdown source from the GitHub repos. I would've probably used Gitbook if they hadn't moved away from the legacy version. I came across mdbook as an alternate for Gitbook and I'm glad I did.

Future plans🔗

I have certainly improved a lot as a writer since I first published my book in late 2018. But after 9 books, I'm finding it a lot more difficult to motivate myself to keep writing. See also HN discussion: Writing a book, still the same pain 15 years later for another example.

I have plans to publish at least one more book in 2021 and revise my existing books (not comprehensive, but a few items have cropped up). I hope the current momentum can extend enough to cover my expenses for this year at least. Beyond that, I think I will write more books, but I'll have to mix it up with other things (such as video courses, interactive courses, freelancing, etc) to keep myself motivated. I just hope that this time I will be able to pick an alternative quickly.

Resources🔗

I've been asked a few times regarding my experiences as an author (especially self publishing) and resources I've used. That was my primary intention in writing this blog post. I thought I'd add a bit of background as well, not the multi-section essay I ended up with. Anyway, here's some links that I've bookmarked related to book writing.

Authors sharing their experiences

Writing skills

Tools and Miscellaneous

Customizing pandoc to generate beautiful pdf and epub from markdown — my own blog post, includes resource links for similar articles and tools other than pandoc
List of awesome design tools
launch-cheatsheet

A parting advice🔗

Don't quit easily!

Multiline fixed string search and replace with CLI tools

2020-11-27T00:00:00+00:00

This post shows how you can use the ripgrep, perl and sd commands to perform multiline fixed string search and replace operations from the command line. Solutions with GNU sed is also discussed, along with its limitations.

Fixed string matching🔗

The below sample input file will be used in the examples in this post.

$ cat ip.txt
This is a multiline
sample input with lots
of special characters
like . () * [] $ {}
| ^ + ? \ and ' and so on.
This post shows how
you can do fixed
-string multiline
search with cli tools.

ripgrep🔗

ripgrep supports the -U option to allow multiline matching. The -F option turns off regexp matching, i.e. the search string is treated literally. In the bash shell (and likely most other shells), you can press enter key to insert literal newline character for quoted values. When you do so, the next line starts with the secondary prompt PS2, which is usually > and a space character. This isn't shown in the examples below to make it easier to copy-paste the commands.

$ rg -UF 'like . () * [] $ {}
| ^ + ? \ and' ip.txt
4:like . () * [] $ {}
5:| ^ + ? \ and ' and so on.

# use the -l option to get only the filename instead of matching lines
$ rg -lUF 'like . () * [] $ {}
| ^ + ? \ and' ip.txt
ip.txt

You'll have an issue if your search string itself contains single quote characters. Avoid using double quotes as a workaround, as that has its own set of special characters. You can work around by concatenating multiple strings next to each other, along with escaped single quote characters as needed.

# the -N option disables line number prefix
$ rg -NUF 'like . () * [] $ {}
| ^ + ? \ and '\'' and' ip.txt
like . () * [] $ {}
| ^ + ? \ and ' and so on.

If your search string starts with the - character, you'll have to use -- before the search argument.

$ rg -NUF -- '-string multiline
search' ip.txt
-string multiline
search with cli tools.

perl🔗

You can use the -0777 option with perl to slurp the entire input as a single string. Another advantage with perl is that you can use files to pass the search and replace strings. Thus, you don't have to worry about any character that may clash with shell metacharacters. See my Perl One-Liners Guide if you are not familiar with using perl from the command line.

$ cat search_1.txt
like . () * [] $ {}
| ^ + ? \ and ' and so on.

# display filename if the given search string matches
$ perl -0777 -nE '!$#ARGV ? $s=$_ :
                  /\Q$s/ && say $ARGV' search_1.txt ip.txt
ip.txt

However, you'll have to make sure the file doesn't end with a newline if you are providing partial lines for searching, or take care of it within the perl script.

$ cat search_2.txt
-string multiline
search

# no output because there's a newline at the end of search_2.txt file
$ perl -0777 -nE '!$#ARGV ? $s=$_ :
                  /\Q$s/ && say $ARGV' search_2.txt ip.txt

# this will remove newline from the end of file before assigning to $s
$ perl -0777 -nE '!$#ARGV ? $s=s/\n\z//r :
                  /\Q$s/ && say $ARGV' search_2.txt ip.txt
ip.txt

By default, ripgrep gives entire matching lines. To get rest of the line with perl, you'll have to explicitly add a pattern around the search string.

# $& variable has the entire matching portion
$ perl -0777 -nE '!$#ARGV ? $s=s/\n\z//r :
                  /\Q$s/ && say $&' search_2.txt ip.txt
-string multiline
search

# use 'say $& while /.*\Q$s\E.*/g' if there are multiple matches
$ perl -0777 -nE '!$#ARGV ? $s=s/\n\z//r :
                  /.*\Q$s\E.*/ && say $&' search_2.txt ip.txt
-string multiline
search with cli tools.

Fixed string substitution🔗

ripgrep🔗

ripgrep also supports replacing the matched string with something else using the -r option. By default, you'll see only matched lines in the output. Use the --passthru option to display all the input lines, even if they do not match the given search string. See my blog post for more details about the -r option and various ways you can use it for substitution requirements.

$ rg --passthru -NUF 'like . () * [] $ {}
| ^ + ? \ and' -r '====
----
====' ip.txt
This is a multiline
sample input with lots
of special characters
====
----
==== ' and so on.
This post shows how
you can do fixed
-string multiline
search with cli tools.

Apart from having to workaround single quote, you'll have to use $$ instead of $ as it is used for backreferences in the replacement section.

$ echo 'sample input' | rg --passthru -F 'in' -r '$a'
sample put
$ echo 'sample input' | rg --passthru -F 'in' -r '$$a'
sample $aput

perl🔗

With perl, you can use files for both the search and replace strings. And, you can easily choose to replace the first or all occurrences, unlike ripgrep where it always replaces all the matches.

$ cat replace.txt
---------------------
$& = $1 + $2 / 3 \ 4
=====================

$ perl -0777 -ne '$#ARGV==1 ? $s=$_ : $#ARGV==0 ? $r=$_ :
                  print s/\Q$s/$r/gr' search_1.txt replace.txt ip.txt
This is a multiline
sample input with lots
of special characters
---------------------
$& = $1 + $2 / 3 \ 4
=====================
This post shows how
you can do fixed
-string multiline
search with cli tools.

As seen before, you'll have to remove newline from the search string for partial line matching.

# use $r=s/\n\z//r to avoid trailing newline from replace.txt
$ perl -0777 -ne '$#ARGV==1 ? $s=s/\n\z//r : $#ARGV==0 ? $r=$_ :
                  print s/\Q$s/$r/gr' search_2.txt replace.txt ip.txt
This is a multiline
sample input with lots
of special characters
like . () * [] $ {}
| ^ + ? \ and ' and so on.
This post shows how
you can do fixed
---------------------
$& = $1 + $2 / 3 \ 4
=====================
 with cli tools.

sd🔗

sd supports a fixed string option and Rust regexp based substitution. Unlike ripgrep, the -s option for fixed string will apply to both the search and replacement sections. sd does in-place editing for file inputs by default, you can use -p to preview results on the terminal. Multiline matching is automatically performed by default.

$ echo 'sample input' | sd -s 'in' '$a'
sample $aput

$ sd -ps 'like . () * [] $ {}
| ^ + ? \ and' '====
----
====' ip.txt
This is a multiline
sample input with lots
of special characters
====
----
==== ' and so on.
This post shows how
you can do fixed
-string multiline
search with cli tools.

Saving file contents to a variable🔗

Trailing newlines and ASCII NUL characters will be lost if you wish to save contents of a file as bash variables using the var=$(< filename) command. See stackoverflow: pitfalls of reading file into shell variable for more details.

$ printf '\na\0b\n123\n\n\n\n\n\n\n\n' > t1
$ a=$(< t1)
bash: warning: command substitution: ignored null byte in input

# NUL character is lost after the assignment
# all the trailing newlines are lost as well
$ printf '%b' "$a" | cat -A
$
ab$
123

ripgrep🔗

If your search string doesn't have multiple trailing newlines or ASCII NUL characters, then you can save file contents to variables and then pass them to ripgrep. Single trailing newline will not normally cause an issue for searching operations as ripgrep will append a newline while displaying results anyway. If you want to make sure input file also contains the trailing newline, then you can manually concatenate a newline character to the search string.

$ s=$(< search_1.txt)
# use "$s"$'\n' if you want to match trailing newline as well
$ rg -NUF "$s" ip.txt
like . () * [] $ {}
| ^ + ? \ and ' and so on.

# use -- if the search string starts with a - character
$ s=$(< search_2.txt)
$ rg -NUF -- "$s" ip.txt
-string multiline
search with cli tools.

For substitution operations, you'll have to preprocess the replacement file to replace $ with $$.

$ s=$(< search_1.txt)
$ r=$(sed 's/\$/$$/g' replace.txt)

# here, removal of trailing newline doesn't cause an issue,
# as it evens out between the search and replace strings
$ rg --passthru -NUF "$s" -r "$r" ip.txt
This is a multiline
sample input with lots
of special characters
---------------------
$& = $1 + $2 / 3 \ 4
=====================
This post shows how
you can do fixed
-string multiline
search with cli tools.

Here, partial line has to be matched. So, $() assignment works well for the search string. If the trailing newline of the replacement string isn't needed, then $() assignment again is good enough. Otherwise, you can modify the replacement string as -r "$r"$'\n'

$ s=$(< search_2.txt)
$ r=$(sed 's/\$/$$/g' replace.txt)

$ rg --passthru -NUF -r "$r" -- "$s" ip.txt
This is a multiline
sample input with lots
of special characters
like . () * [] $ {}
| ^ + ? \ and ' and so on.
This post shows how
you can do fixed
---------------------
$& = $1 + $2 / 3 \ 4
===================== with cli tools.

sd🔗

As mentioned before, the -s option for sd applies to both the search and replacement sections. So, the usage is lot simpler compared to ripgrep.

# -- is needed here because replace.txt starts with a - character
$ sd -ps -- "$(< search_1.txt)" "$(< replace.txt)" ip.txt
This is a multiline
sample input with lots
of special characters
---------------------
$& = $1 + $2 / 3 \ 4
=====================
This post shows how
you can do fixed
-string multiline
search with cli tools.

GNU sed🔗

To follow a similar approach with GNU sed, you'll have to preprocess the strings to escape metacharacters. Assuming input doesn't have ASCII NUL characters, you can use -z option to slurp the entire input as a single string.

Here's an example for multiline search.

# escape all BRE metacharacters
# replace literal newlines with \n
$ s=$(sed -z 's#[[^$*.\/]#\\&#g; s/\n/\\n/g' search_1.txt)

# since newlines are replaced with \n,
# trailing newlines will be preserved here
$ echo "$s"
like \. () \* \[] \$ {}\n| \^ + ? \\ and ' and so on\.\n

# display filename if input matches the given multiline search string
# tr is used to change the NUL character after filename to newline
$ sed -nz '/'"$s"'/F' ip.txt | tr '\0' '\n'
ip.txt

And here's an example for multiline substitution.

# last newline is removed here to allow partial line matching
$ s=$(sed -z 's#[[^$*.\/]#\\&#g; s/\n$//; s/\n/\\n/g' search_2.txt)

# escape all replacement section metacharacters
# and prefix \ character to literal newlines, except the last line
$ r=$(sed 's:[\\/&]:\\&:g; $!s/$/\\/' replace.txt)
$ echo "$r"
---------------------\
$\& = $1 + $2 \/ 3 \\ 4\
=====================

# if you need the trailing newline from replace.txt,
# use sed -z 's/'"$s"'/'"$r"'\n/g'
$ sed -z 's/'"$s"'/'"$r"'/g' ip.txt
This is a multiline
sample input with lots
of special characters
like . () * [] $ {}
| ^ + ? \ and ' and so on.
This post shows how
you can do fixed
---------------------
$& = $1 + $2 / 3 \ 4
===================== with cli tools.

Linux CLI ebooks🔗

Check out my ebooks if you are interested in learning more about Linux CLI basics, coreutils, text processing tools like GNU grep, GNU sed, GNU awk and perl.

Emulating regexp lookarounds in GNU sed

2020-10-31T00:00:00+00:00

This stackoverflow Q&A got me thinking about various ways to construct a solution in GNU sed if lookarounds are needed.

Only single line (with newline as the line separator) processing is presented here. Equivalent lookaround syntax with grep -P or perl is also shown for comparison. Cases where multiple lines and/or ASCII NUL characters are present in the pattern space is left as an exercise.

Filtering🔗

Here, you only need to decide whether the input line has to be matched or not. sed supports grouping commands inside {} that should be executed only if a filtering condition is matched. The condition could be negated by adding a ! character. In this way, you can emulate chaining of multiple positive and/or negative lookaround conditions.

$ cat items.txt
1,2,3,4
apple=50 ;per kg
a,b,c,d
;foo xyz3

# lines containing a digit character followed by a ; character anywhere after
# lookaround isn't needed here
# same as: grep '[0-9].*;' or grep -P '\d(?=.*;)'
$ sed -n '/[0-9].*;/p' items.txt
apple=50 ;per kg

# lines containing both digit and ; characters in any order
# same as: grep -P '^(?=.*;).*\d'
$ sed -n '/;/{ /[0-9]/p }' items.txt
apple=50 ;per kg
;foo xyz3

# lines containing both digit and ; characters
# but not if the line also contains character a
# same as: grep -P '^(?!.*a)(?=.*;).*\d'
$ sed -n '/a/!{ /;/{ /[0-9]/p } }' items.txt
;foo xyz3

For some cases, multiple condition check like the previous examples is not enough. For example, filter a line if it contains par as long as cart isn't present later in the line. Presence of cart earlier in the line shouldn't affect the outcome. In such cases, you can first change the input line to add a newline character wherever cart is present and then construct a condition such that it depends on the newline character instead of cart. If a match is found, delete all the newline characters and then print the line.

$ s='par carted spare cart park city\na parking cart\n'

# same as: grep -P 'par(?!.*cart)'
$ printf '%b' "$s" | sed -n 's/cart/\n&/g; /par[^\n]*$/{ s/\n//g; p }'
par carted spare cart park city

Newline is a safe character to choose for default line by line processing, as sed removes it from the pattern space. If you are processing a pattern space that contains newline character (for example: -z option, N command, etc), then you can still perform this trick as long as you know a character that is guaranteed to be absent from the input data.

Substitution🔗

In the previous section, you saw how to modify input line with newline character to make it easier to construct a lookaround condition. This trick comes in handy for substitution as well. However, for search and replace cases, you also need to emulate zero-width nature of lookarounds. To achieve this, you can make use of t command to construct a loop that performs substitution as long as a match is found. See my chapter on Control structures for more details about branching commands in GNU sed.

Here's an example of looping. Aim is to delete fin from the given input recursively.

# manual repetition, assuming count is known
$ echo 'coffining' | sed 's/fin//'
cofing
$ echo 'coffining' | sed 's/fin//; s///'
cog

# :loop marks the 's' command with label 'loop'
# tloop will jump to label 'loop' as long as the substitution succeeds
$ echo 'coffining' | sed ':loop s/fin//; tloop'
cog

Negative lookarounds🔗

Some cases can be solved by performing substitution only if a condition is first satisfied. For this example, need to first select lines if it doesn't start with a ; character. Then, for such lines, remove everything from the first space or comma character. Note that {} grouping is optional here.

# same as: perl -ne 'print if s/^(?!;).*?\K[ ,].*//'
$ sed -n '/^;/! s/[ ,].*//p' items.txt
1
apple=50
a

For this example, need to change foo to [baz] only if it is not followed by a digit character. Note that foo at the end of string also satisfies this assertion. foofoo has two matches as the assertion is zero-width in nature, i.e. it doesn't consume characters. Here, the first step is inserting a newline character between foo and a digit character. Then change all foo to [baz] as long as it is at the end of string or if it isn't followed by a newline character. Once the loop ends, remove all the newline characters.

$ s='hey food! foo42 foot5 foofoo'

# same as: perl -pe 's/foo(?!\d)/[baz]/g'
$ echo "$s" | sed -E 's/(foo)([0-9])/\1\n\2/g;
                      :a s/foo([^\n]|$)/[baz]\1/; ta;
                      s/\n//g'
hey [baz]d! foo42 [baz]t5 [baz][baz]

Change foo to [baz] only if it is not preceded by _ character. foo at the start of string is matched as well.

$ s='foo _foo 42foofoo'

# same as: perl -pe 's/(?<!_)foo/[baz]/g'
$ echo "$s" | sed -E 's/(_)(foo)/\1\n\2/g;
                      :a s/(^|[^\n])foo/\1[baz]/; ta;
                      s/\n//g'
[baz] _foo 42[baz][baz]

Replace par with [xyz] as long as s character is not present later in the input. This assumes that the assertion doesn't conflict with the search pattern, for example s will not conflict with par but would affect if it was r and par.

$ s='par spare part party'

# same as: perl -pe 's/par(?!.*s)/[xyz]/g'
$ echo "$s" | sed -E 's/s/&\n/g;
                      :a s/par([^\n]*)$/[xyz]\1/; ta;
                      s/\n//g'
par s[xyz]e [xyz]t [xyz]ty

Replace all empty fields with NA for csv input (assuming no embedded comma, newline characters, etc).

$ s=',1,,,two,3,,,'

# same as: perl -lpe 's/(?<![^,])(?![^,])/NA/g'
$ echo "$s" | sed -E ':a s/,,/,NA,/g; ta; s/^,/NA,/; s/,$/,NA/'
NA,1,NA,NA,two,3,NA,NA,NA

Replace if go is not there between at and par.

$ s='fox,cat,dog,parrot,dot,park,bat,go,spare,sat-in-a-park'

# same as: perl -pe 's/at((?!go).)*par/[xyz]/g'
$ echo "$s" | sed 's/go/\n&/g; s/at[^\n]*par/[xyz]/g; s/\n//g'
fox,c[xyz]k,bat,go,spare,s[xyz]k

Positive lookarounds🔗

In this example, need to surround fields with [] except first and last fields for csv input (assuming no embedded comma, newline characters, etc). With positive lookaround emulation, the modified string may continue to satisfy the matching condition, resulting in infinite looping. In this example, the fields themselves may contain [] characters, so you cannot use them to prevent infinite loop. The newline character trick comes in handy again.

$ s='1,t[w]o,[3],f[ou]r,5'

# same as: perl -pe 's/(?<=,)[^,]+(?=,)/[$&]/g'
$ echo "$s" | sed -E ':a s/,([^,\n]+),/,\n[\1],/g; ta; s/\n//g'
1,[t[w]o],[[3]],[f[ou]r],5

Add space at word boundaries, but not at the start or end of string. Also, don't add space if it is already present. Here, negated character class on space character is enough to emulate the assertion.

$ s='total= num1+35*42/num2'

# same as: perl -lpe 's/(?<=[^ ])\b(?=[^ ])/ /g'
$ echo "$s" | sed -E ':a s/([^ ])\b([^ ])/\1 \2/; ta;'
total = num1 + 35 * 42 / num2

Replace par with [xyz] as long as part occurs as a whole word later in the line. Here, the nature of the modified string itself prevents the possibility of infinite loop.

$ s='par spare part party'

# same as: perl -pe 's/par(?=.*\bpart\b)/[xyz]/g'
$ echo "$s" | sed -E ':a s/par(.*\bpart\b)/[xyz]\1/; ta'
[xyz] s[xyz]e part party

Summary🔗

Branching commands and some creative preprocessing of the input can be combined to emulate lookaround assertions in sed. Given that Unix utility sed is Turing complete, it's perhaps not a big surprise. Now, please excuse me, I'll be busy reaping points on stackoverflow/unix.stackexchange for this edge case ;)

Search and replace tricks with ripgrep

2020-09-16T00:00:00+00:00

ripgrep (command name rg) is a grep tool, but supports search and replace as well. rg is far from a like-for-like alternate for sed, but it has nifty features like multiline replacement, fixed string matching, PCRE2 support, etc. This post gives an overview of syntax for substitution and highlights some of the cases where rg is a handy replacement for sed.

Global search and replace🔗

$ cat ip.txt
dark blue, light blue
light orange
blue sky

# by default, line number is displayed if output destination is stdout
# by default, only lines that matched the given pattern is displayed
# 'blue' is search pattern and -r 'red' is replacement string
$ rg 'blue' -r 'red' ip.txt
1:dark red, light red
3:red sky

# --passthru option is useful to print all lines, whether or not it matched
# -N will disable line number prefix
# this command is similar to: sed 's/blue/red/g' ip.txt
$ rg --passthru -N 'blue' -r 'red' ip.txt
dark red, light red
light orange
red sky

Matching Nth occurrence🔗

As seen in previous example, rg will search and replace all occurrences. So, you'll have to be creative with regexp to replace only a specific occurrence per input line.

$ s='see bat hot at but at go gate at sat at but at'

# replace first occurrence only
# same as: sed 's/\bat\b/[xyz]/'
$ echo "$s" | rg --passthru -N '\bat\b(.*)' -r '[xyz]$1'
see bat hot [xyz] but at go gate at sat at but at

# same as: sed 's/\bat\b/[xyz]/3'
# the number within {} is N-1 to replace Nth occurrence, for N>1
$ echo "$s" | rg --passthru -N '^((.*?\bat\b){2}.*?)\bat\b' -r '$1[xyz]'
see bat hot at but at go gate [xyz] sat at but at

# replace last but Nth occurrence, for N>=0
$ echo "$s" | rg --passthru -N '^(.*)\bat\b((.*\bat\b){3})' -r '$1[xyz]$2'
see bat hot at but [xyz] go gate at sat at but at

In-place workaround🔗

rg doesn't support in-place option, so you'll have to do it yourself.

# -N isn't needed here as output destination is a file
# same as: sed -i 's/blue/red/g' ip.txt
$ rg --passthru 'blue' -r 'red' ip.txt > tmp.txt && mv tmp.txt ip.txt

$ cat ip.txt
dark red, light red
light orange
red sky

If you have moreutils installed, then you could use sponge as well.

rg --passthru 'blue' -r 'red' ip.txt | sponge ip.txt

Rust regex and PCRE2🔗

By default, rg uses Rust regular expressions, which is much more featured compared to GNU sed. The main feature not supported is backreference within regexp definition (for performance reasons). See Rust regex documentation for regular expression syntax and features. rg supports Unicode by default.

# non-greedy quantifier is supported
$ s='food land bark sand band cue combat'
$ echo "$s" | rg --passthru 'foo.*?ba' -r '[xyz]'
[xyz]rk sand band cue combat

# unicode support
$ echo 'fox:αλεπού,eagle:αετός' | rg --passthru '\p{L}+' -r '($0)'
(fox):(αλεπού),(eagle):(αετός)

# set operator example, remove all punctuation characters except . ! and ?
$ para='"hi", there! how *are* you? all fine here.'
$ echo "$para" | rg --passthru '[[:punct:]--[.!?]]+' -r ''
hi there! how are you? all fine here.

The -P switch will enable PCRE2 flavor, which has even more tricks. You can also use --engine=auto to allow rg to automatically use PCRE2 when needed (for example: useful as an alias for rg command so that it gives performance of Rust engine by default and use PCRE2 only when needed).

# backreference within regexp definition
$ s='cocoa appleseed tool speechless'
$ echo "$s" | rg --passthru -wP '([a-z]*([a-z])\2[a-z]*){2}' -r '{$0}'
cocoa {appleseed} tool {speechless}

# replace all whole words except 'imp' and 'ant'
$ s='tiger imp goat eagle ant important'
$ echo "$s" | rg --passthru -P '\b(imp|ant)\b(*SKIP)(*F)|\w+' -r '[$0]'
[tiger] imp [goat] [eagle] ant [important]

# recursively match parentheses
$ eqn='(3+a)x * y((r-2)*(t+2)/6) + z(a(b(c(d(e)))))'
$ echo "$eqn" | rg --passthru -P '\((?:[^()]++|(?0))++\)' -r ''
x * y + z

$ # all lowercase letters and optional hyphen combo from start of string
$ s='apple-fig-mango guava grape'
$ echo "$s" | rg --passthru -P '\G([a-z]+)(-)?' -r '($1)$2'
(apple)-(fig)-(mango) guava grape

Extract and modify🔗

The -r option can be used when -o option is active too. The example shown below is not easy to do with sed.

$ s='0501 035 154 12 26 98234'

# numbers >= 100 and ignore leading zeros
$ echo "$s" | rg -woP '0*+(\d{3,})' -r '"$1"' | paste -sd,
"501","154","98234"

Fixed string matching🔗

Like grep, the -F option will allow fixed strings to be matched, a handy option that I feel every search and replace tool should provide.

$ printf '2.3/[4]*6\nfoo\n5.3-[4]*9\n' | rg --passthru -F '[4]*' -r '2'
2.3/26
foo
5.3-29

-F doesn't extend to replacement section though, so you need $$ instead of $ character to represent it literally.

$ echo 'a.*{2}-b' | rg --passthru -F '.*{2}' -r '+$x\tc'
a+\tc-b
$ echo 'a.*{2}-b' | rg --passthru -F '.*{2}' -r '+$$x\tc'
a+$x\tc-b

Multiline matching🔗

Another handy option is -U which enables multiline matching.

$ s='hi there\nhave a nice day\nbye'

# (?s) flag will allow . to match newline characters as well
$ printf '%b' "$s" | rg --passthru -U '(?s)the.*ice' -r ''
hi  day
bye

See my blog post for a detailed discussion on multiline fixed string search and replace operations from the command line.

Handling dos-style input🔗

rg provides support for dos-style files with --crlf option.

# same as: sed -E 's/\w+(\r?)$/xyz\1/'
# note that output will retain CR+LF as line ending
# similar to the sed solution, this will work for unix-style input too
$ printf 'hi there\r\ngood day\r\n' | rg --passthru --crlf '\w+$' -r 'xyz'
hi xyz
good xyz

Speed comparison with GNU sed🔗

Another advantage of rg is that it is likely to be faster than sed. See ripgrep benchmark with other grep implementations by the author for a methodological detailed analysis and insights.

# for small files, initial processing time of rg is a large component
$ time echo 'aba' | sed 's/a/b/g' > f1
real	0m0.002s
$ time echo 'aba' | rg --passthru 'a' -r 'b' > f2
real	0m0.007s

# for larger files, rg is likely to be faster
# 6.2M sample ASCII file
$ wget 'https://norvig.com/big.txt'
$ time LC_ALL=C sed 's/\bcat\b/dog/g' big.txt > f1
real	0m0.060s
$ time rg --passthru '\bcat\b' -r 'dog' big.txt > f2
real	0m0.048s
$ diff -s f1 f2
Files f1 and f2 are identical

# nearly 8 times faster!!
$ time LC_ALL=C sed -E 's/\b(\w+)(\s+\1)+\b/\1/g' big.txt > f1
real	0m0.725s
$ time rg --no-unicode --passthru -wP '(\w+)(\s+\1)+' -r '$1' big.txt > f2
real	0m0.093s
$ diff -s f1 f2
Files f1 and f2 are identical

Other alternatives for sed🔗

rpl — search and replace tool, has interesting options like interactive mode and recursive mode
sd — simple search and replace, implemented in Rust
perl and ruby — programming languages with excellent command line support

I know Python basics, what next?

2020-07-25T00:00:00+00:00

Poster created using Canva

Next step🔗

Programmers often wonder what to do after learning the basics. Searching for what next on /r/learnpython will give you too many results. And here are some wonderful articles related to this topic:

I do not have a simple answer to this question. However, I'll list a few topics along with resources that might help you take the next step in your Python learning journey.

Exercises🔗

If you feel comfortable with programming basics and Python syntax, then exercises are a good way to test your knowledge. The resource you used to learn Python will typically have some sort of exercises, so those would be ideal as a first choice. I'd also suggest using the below resources to improve your skills. If you get stuck, reread the material related to those topics, search online, ask for clarifications, etc — in short, make an effort to solve it. It is okay to skip some troublesome problems (and come back to it later if you have the time), but you should be able to solve most of the beginner problems. Maintaining notes and cheatsheets will help too, especially for common mistakes.

Exercism, Hackinscience and Practicepython — these are all beginner friendly and difficulty levels are marked
Python Exercises — my interactive TUI app, suited for beginner to intermediate level Python learners
Python Programming Exercises, Gently Explained — includes gentle explanations of the problem, the prerequisite coding concepts you'll need to understand the solution, etc
Adventofcode, Codewars, Python Morsels — includes more challenging exercises for intermediate to advanced level users
Checkio, Codingame — gaming based challenges
/r/dailyprogrammer — interesting challenges

See also this article on solving programming exercises.

Projects🔗

Once you are comfortable with basics and syntax, the next step is projects. I wrote a 10-line program that solved a common problem for me — adding body { text-align: justify } to epub files that are not justify aligned. I didn't know that this line would help beforehand. Found a solution online and then automated the process of unzipping epub, adding the line and then packing it again. That will likely need you to lookup documentation and go through some stackoverflow Q&A as well. And once you have written the solution and use it regularly, you'll likely encounter corner cases and features to be added. I feel this is a great way to learn and understand programming.

These days, I use a better EPUB reader that allows me to customize alignments. Here's another real world example. I'm on Linux and use the terminal for many things. I wanted a CLI tool to do simple calculations. There's bc command, but it doesn't accept direct string argument and you need to set scale and so on. So, I looked up how to write a CLI tool in Python and wrote one using the built-in argparse module that works for my particular use cases.

Here are some resources to help you get started on projects:

Projects with solutions — algorithms, data structures, networking, security, databases, etc
Project based learning — web applications, bots, data science, machine learning, etc
Pytudes by Peter Norvig — Python programs, usually short, of considerable difficulty
Books:
- The Big Book of Small Python Projects
- Tiny Python Projects
- Practical Python Projects
- Real world Python
- Practice Python Projects — my book on beginner to intermediate level projects
/r/learnpython: What do you automate with Python at home?
Projectbook — collection of over 100 software project ideas for people looking to learn a given language or technology

See also The Good Research Code Handbook to learn how to organize your code so that it is easy to understand and works reliably.

Debugging🔗

Knowing how to debug your programs is crucial and should be ideally taught right from the beginning instead of a chapter at the end of the book. Think Python is an awesome example for such a resource material.

Sites like Pythontutor allow you to visually debug a program — you can execute a program step by step and see the current value of variables. Similar feature is typically provided by IDEs like Pycharm and Thonny. Under the hood, these visualizations are using the pdb module. See also Python debugging with pdb.

Debugging is often a frustrating experience. Taking a break helps (and sometimes I find the solution or spot a problem in my dreams). Try to reduce the code as much as possible so that you are left with minimal code necessary to reproduce the issue. Talking about the problem to a friend/colleague/inanimate-objects/etc can help too — known as Rubber duck debugging. I have often found the issue while formulating a question to be asked on forums like stackoverflow/reddit because writing down your problem is another way to bring clarity than just having a vague idea in your mind. Here's some more articles on this challenging topic:

Here's a summarized snippet from a collection of interesting bug stories.

A jpeg parser choked whenever the CEO came into the room, because he always had a shirt with a square pattern on it, which triggered some special case of contrast and block boundary algorithms.

Testing🔗

Another crucial aspect in the programming journey is knowing how to write tests. In bigger projects, usually there are separate engineers (often in much larger number than code developers) to test the code. Even in those cases, writing a few sanity test cases yourself can help you develop faster knowing that the changes aren't breaking basic functionality.

There's no single consensus on test methodologies. There is Unit testing, Integration testing, Test-driven development and so on. Often, a combination of these is used. These days, machine learning is also being considered to reduce the testing time, see Testing Firefox more efficiently with machine learning for example.

When I start a project, I usually try to write the programs incrementally. Say I need to iterate over files from a directory. I will make sure that portion is working (usually with print statements), then add another feature — say file reading and test that and so on. This reduces the burden of testing a large program at once at the end. And depending upon the nature of the program, I'll add a few sanity tests at the end. For example, for my command_help project, I copy pasted a few test runs of the program with different options and arguments into a separate file and wrote a program to perform these tests programmatically whenever the source code is modified.

For non-trivial projects, you'll usually end up needing frameworks like built-in module unittest or third-party modules like pytest. Here's some learning resources.

Getting started with testing in Python
Python testing style guide
TDD in Python with pytest
obeythetestinggoat — TDD for the Web, with Python, Selenium, Django, JavaScript and pals
Modern Test-Driven Development in Python — TDD guide, has a real world application example

Intermediate to Advanced Python resources🔗

Intermediate

Official Python docs — Python docs are a treasure trove of information
Pydon'ts — Write elegant Python code, make the best use of the core Python features
Calmcode — videos on testing, code style, args kwargs, data science, etc
Practical Python Programming — covers foundational aspects of Python programming with an emphasis on script writing, data manipulation, and program organization
Beyond the Basic Stuff with Python — Best Practices, Tools, and Techniques, OOP, Practice Projects
Python Distilled — this pragmatic guide provides a concise narrative related to fundamental programming topics such as data abstraction, control flow, program structure, functions, objects, and modules
Python in a Nutshell — use modern Python idiomatically, structure Python projects, how to debug

Algorithms and Design patterns

Problem solving with algorithms and data structures
GitHub: Collection of design patterns and idioms
Clean Architectures in Python — software design methodology

Advanced

Fluent Python — takes you through Python's core language features and libraries, and shows you how to make your code shorter, faster, and more readable at the same time
Serious Python — deployment, scalability, testing, and more
Practices of the Python Pro — learn to design professional-level, clean, easily maintainable software at scale, includes examples for software development best practices
Advanced Python Mastery — exercise-driven course on Advanced Python Programming that was battle-tested several hundred times on the corporate-training circuit for more than a decade

Handy cheatsheets🔗

Python Crash Course cheatsheet
Comprehensive Python cheatsheet
Scientific Python cheatsheet
Common beginner errors
Python regular expression cheatsheet — my blog post, includes examples as well

More Python resources🔗

Inspired by this post, I made a Python learning resources repository which is categorized (beginner, intermediate, advanced, domains like web/ML/data science, etc) and includes a handy search feature.

I hope these resources will help you take that crucial next step and continue your Python journey. Happy learning :)

JavaScript regular expressions cheatsheet and examples

2020-07-20T00:00:00+00:00

Above diagram created using Regulex

This blog post gives an overview of regular expression syntax and features supported by JavaScript. Examples have been tested on the Chrome/Chromium console and includes features not available in other browsers and platforms. This post is an excerpt from my Understanding JavaScript RegExp book.

Elements that define a regular expression🔗

Note	Description
MDN: Regular Expressions	MDN reference for JavaScript regular expressions
`/pat/`	a RegExp object
`const pet = /dog/`	save regexp in a variable for reuse, clarity, etc
`/pat/.test(s)`	check if the pattern is present anywhere in the input string
	returns `true` or `false`
`i`	flag to ignore case when matching alphabets
`g`	flag to match all occurrences
`new RegExp('pat', 'i')`	construct RegExp from a string
	optional second argument specifies flags
	use backtick strings with `${}` for interpolation
`source`	property to convert a RegExp object to a string
	helps to insert a RegExp inside another RegExp
`flags`	property to get flags of a RegExp object
`s.replace(/pat/, 'repl')`	method for search and replace
`s.search(/pat/)`	gives the starting location of the match or `-1`
`s.split(/pat/)`	split a string based on regexp

Anchors	Description
`^`	restricts the match to the start of string
`$`	restricts the match to the end of string
`m`	flag to match the start/end of line with `^` and `$` anchors
	`\r`, `\n`, `\u2028` and `\u2029` are line separators
	DOS-style files use `\r\n`, may need special attention
`\b`	restricts the match to the start and end of words
	word characters: alphabets, digits, underscore
`\B`	matches wherever `\b` doesn't match

^, $ and \ are metacharacters in the above table, as these characters have a special meaning. Prefix a \ character to remove the special meaning and match such characters literally. For example, \^ will match a ^ character instead of acting as an anchor.

Feature	Description
`pat1\|pat2\|pat3`	multiple regexp combined as conditional OR
	each alternative can have independent anchors
`(pat)`	group pattern(s), also a capturing group
`a(b\|c)d`	same as `abd\|acd`
`(?:pat)`	non-capturing group
`(?<name>pat)`	named capture group
`.`	match any character except line separators
`s`	flag to match line separators as well
`[]`	character class, matches one character among many

Alternation precedence: pattern which matches earliest in the input gets higher priority. Tie-breaker is left-to-right if matches have the same starting location.

Greedy Quantifiers	Description
`?`	match `0` or `1` times
`*`	match `0` or more times
`+`	match `1` or more times
`{m,n}`	match `m` to `n` times
`{m,}`	match at least `m` times
`{n}`	match exactly `n` times
`pat1.*pat2`	any number of characters between `pat1` and `pat2`
`pat1.pat2\|pat2.pat1`	match both `pat1` and `pat2` in any order

Greedy here means that the above quantifiers will match as much as possible that'll also honor the overall regexp. Appending a ? to greedy quantifiers makes them non-greedy, i.e. match as minimally as possible. Quantifiers can be applied to literal characters, groups, backreferences and character classes.

Character class	Description
`[ae;o]`	match any of these characters once
`[3-7]`	range of characters from `3` to `7`
`[^=b2]`	negated set, match other than `=` or `b` or `2`
`[a-z-]`	`-` should be the first/last or escaped using `\` to match literally
`[+^]`	`^` shouldn't be the first character or escaped using `\`
`[\]\\]`	`]` and `\` should be escaped using `\`
	`[` doesn't need escaping, but `\[` can also be used
`\w`	similar to `[A-Za-z0-9_]` for matching word characters
`\d`	similar to `[0-9]` for matching digit characters
`\s`	similar to `[ \t\n\r\f\v]` for matching whitespace characters
	use `\W`, `\D`, and `\S` for their opposites respectively
`u`	flag to enable unicode matching
`v`	superset of `u` flag, enables additional features
`\p{}`	Unicode character sets
`\P{}`	negated Unicode character sets
	see MDN: Unicode character class escape for details
`\u{}`	specify Unicode characters using codepoints

Lookarounds	Description
lookarounds	create custom positive/negative assertions
	zero-width like anchors and not part of matching portions
`(?!pat)`	negative lookahead assertion
`(?<!pat)`	negative lookbehind assertion
`(?=pat)`	positive lookahead assertion
`(?<=pat)`	positive lookbehind assertion
	variable length lookbehind is allowed
`(?!pat1)(?=pat2)`	multiple assertions can be specified next to each other in any order
	as they mark a matching location without consuming characters
`((?!pat).)*`	Negates a regexp pattern

Matched portion	Description
`m = s.match(/pat/)`	assuming the `g` flag isn't used and regexp succeeds,
	returns an array with the matched portion and 3 properties
	`index` property gives the starting location of the match
	`input` property gives the input string `s`
	`groups` property gives dictionary of named capture groups
`m[0]`	for the above case, gives the entire matched portion
`m[N]`	matched portion of the Nth capture group
`d`	flag to get the starting and ending locations of the matching portions via the `indices` property
`s.match(/pat/g)`	returns only the matched portions, no properties
`s.matchAll(/pat/g)`	returns an iterator containing details for each matched portion and its properties
Backreference	gives the matched portion of the Nth capture group
	use `$1`, `$2`, `$3`, etc in the replacement section
	`$&` gives the entire matched portion
	$` gives the string before the matched portion
	`$'` gives the string after the matched portion
	use `\1`, `\2`, `\3`, etc within the regexp definition
`$$`	insert `$` literally in the replacement section
`$0N`	same as `$N`, allows to separate backreference and other digits
`\N\xhh`	allows to separate backreference and digits in the regexp definition
`(?<name>pat)`	named capture group
	use `\k<name>` for backreferencing in the regexp definition
	use `$<name>` for backreferencing in the replacement section

Regular expression examples🔗

test() method

> let sentence = 'This is a sample string'

> /is/.test(sentence)
< true
> /xyz/.test(sentence)
< false

> if (/ring/.test(sentence)) {
      console.log('mission success')
  }
< mission success

new RegExp() constructor

> new RegExp('dog', 'i')
< /dog/i

> new RegExp('123\\tabc')
< /123\tabc/

> let greeting = 'hi'
> new RegExp(`${greeting.toUpperCase()} there`)
< /HI there/

string and line anchors

// string anchors
> /^cat/.test('cater')
< true
> ['surrender', 'newer', 'door'].filter(w => /er$/.test(w))
< ['surrender', 'newer']

// use 'm' flag to match at the start/end of each line
> /^par$/m.test('spare\npar\nera\ndare')
< true

// escape metacharacters to match them literally
> /b\^2/.test('a^2 + b^2 - C*3')
< true

replace() method and word boundaries

> let items = 'catapults\nconcatenate\ncat'
> console.log(items.replace(/^/gm, '* '))
< * catapults
  * concatenate
  * cat

> let sample = 'par spar apparent spare part'
// replace 'par' only at the start of word
> sample.replace(/\bpar/g, 'X')
< 'X spar apparent spare Xt'
// replace 'par' at the end of word but not whole word 'par'
> sample.replace(/\Bpar\b/g, 'X')
< 'par sX apparent spare part'

alternations and grouping

// replace either 'cat' at the start of string or 'cat' at the end of word
> 'catapults concatenate cat scat'.replace(/^cat|cat\b/g, 'X')
< 'Xapults concatenate X sX'

// same as: /\bpark\b|\bpart\b/g
> 'park parked part party'.replace(/\bpar(k|t)\b/g, 'X')
< 'X parked X party'

MDN: Regular Expressions Guide provides the escapeRegExp() function, useful to automatically escape metacharacters.
- See also XRegExp, provides handy methods like XRegExp.escape() and XRegExp.union(). The union method has additional functionality of allowing a mix of string and RegExp literals and also takes care of renumbering backreferences.

> function escapeRegExp(string) {
    return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')
  }

> function unionRegExp(arr) {
    return arr.map(w => escapeRegExp(w)).join('|')
  }

> new RegExp(unionRegExp(['c^t', 'dog$', 'f|x']), 'g')
< /c\^t|dog\$|f\|x/g

dot metacharacter and quantifiers

// matches character '2', any character and then character '3'
> '42\t35'.replace(/2.3/, '8')
< '485'
// 's' flag will allow line separators to be matched as well
> 'Hi there\nHave a Nice Day'.replace(/the.*ice/s, 'X')
< 'Hi X Day'

// same as: /part|parrot|parent/g
> 'par part parrot parent'.replace(/par(en|ro)?t/g, 'X')
< 'par X X X'

> ['abc', 'ac', 'abbc', 'xabbbcz'].filter(w => /ab{1,4}c/.test(w))
< ['abc', 'abbc', 'xabbbcz']

match() method

// entire matched portion
> 'too soon a song snatch'.match(/so+n/)[0]
< 'soon'
// matched portion of the second capture group
> let purchase = 'coffee:100g tea:250g sugar:75g chocolate:50g'
> purchase.match(/:(.*?)g.*?:(.*?)g.*?chocolate:(.*?)g/)[2]
< '250'

// starting location of the matching portion
> 'cat and dog'.match(/dog/).index
< 8
// start and end+1 location of the matching portion
> 'awesome'.match(/so/d).indices[0]
< [3, 5]

// get all matching portions with 'g' flag
// no properties or group portions
> 'par spar apparent spare part'.match(/\bs?par[et]\b/g)
< ['spare', 'part']

// useful for debugging purposes as well
> 'green:3.14:teal::brown:oh!:blue'.match(/:.*?:/g)
< [':3.14:', '::', ':oh!:']

matchAll() method

// same as: match(/so*n/g)
> Array.from('song too soon snatch'.matchAll(/so*n/g), m => m[0])
< ['son', 'soon', 'sn']
// get the starting index for each match
> Array.from('song too soon snatch'.matchAll(/so*n/g), m => m.index)
< [0, 9, 14]

// get only the capture group portions as an array for each match
> Array.from('2023/04,1986/Mar,'.matchAll(/(.*?)\/(.*?),/g), m => m.slice(1))
< (2) [Array(2), Array(2)]
  0: (2) ['2023', '04']
  1: (2) ['1986', 'Mar']
  length: 2
  [[Prototype]]: Array(0)

function/dictionary in the replacement section

> function titleCase(m, g1, g2) {
        return g1.toUpperCase() + g2.toLowerCase()
  }
> 'aBc ac ADC aBbBC'.replace(/(a)(.*?c)/ig, titleCase)
< 'Abc Ac Adc Abbbc'

> '1 42 317'.replace(/\d+/g, m => m*2)
< '2 84 634'

> let swap = { 'cat': 'tiger', 'tiger': 'cat' }
> 'cat tiger dog tiger cat'.replace(/cat|tiger/g, k => swap[k])
< 'tiger cat dog cat tiger'

split() method

// split based on one or more digit characters
> 'Sample123string42with777numbers'.split(/\d+/)
< ['Sample', 'string', 'with', 'numbers']
// include the portion that caused the split as well
> 'Sample123string42with777numbers'.split(/(\d+)/)
< ['Sample', '123', 'string', '42', 'with', '777', 'numbers']

// split based on digit or whitespace characters
> '**1\f2\n3star\t7 77\r**'.split(/[\d\s]+/)
< ['**', 'star', '**']

// use non-capturing group if capturing is not needed
> '123handed42handy777handful500'.split(/hand(?:y|ful)?/)
< ['123', 'ed42', '777', '500']

backreferencing with normal/non-capturing/named capture groups

// remove any number of consecutive duplicate words separated by space
// use \W+ instead of space to cover cases like 'a;a<-;a'
> 'aa a a a 42 f_1 f_1 f_13.14'.replace(/\b(\w+)( \1)+\b/g, '$1')
< 'aa a 42 f_1 f_13.14'

// add something around the entire matched portion
> '52 apples and 31 mangoes'.replace(/\d+/g, '($&)')
< '(52) apples and (31) mangoes'

// duplicate the first field and add it as the last field
> 'fork,42,nice,3.14'.replace(/,.+/, '$&,$`')
< 'fork,42,nice,3.14,fork'

// use non-capturing groups when backreferencing isn't needed
> '1,2,3,4,5,6,7'.replace(/^((?:[^,]+,){3})([^,]+)/, '$1($2)')
< '1,2,3,(4),5,6,7'

// named capture groups, same as: replace(/(\w+),(\w+)/g, '$2,$1')
> 'good,bad 42,24 x,y'.replace(/(?<fw>\w+),(?<sw>\w+)/g, '$<sw>,$<fw>')
< 'bad,good 24,42 y,x'

examples for lookarounds

// change 'cat' only if it is not followed by a digit character
// note that the end of string satisfies the given assertion
// 'catcat' has two matches as the assertion doesn't consume characters
> 'hey cats! cat42 cat_5 catcat'.replace(/cat(?!\d)/g, 'dog')
< 'hey dogs! cat42 dog_5 dogdog'

// change whole word only if it is not preceded by : or --
> ':cart apple --rest ;tea'.replace(/(?<!:|--)\b\w+/g, 'X')
< ':cart X --rest ;X'

// extract digits only if it is preceded by - and followed by ; or :
> '42 apple-5, fig3; x-83, y-20: f12'.match(/(?<=-)\d+(?=[;:])/g)
< ['20']

// words containing all lowercase vowels in any order
> let words = ['sequoia', 'questionable', 'exhibit', 'equation']
> words.filter(w => /(?=.*a)(?=.*e)(?=.*i)(?=.*o).*u/.test(w))
< ['sequoia', 'questionable', 'equation']

// replace only the third occurrence of 'cat'
> 'cat scatter cater scat'.replace(/(?<=(cat.*?){2})cat/, 'X')
< 'cat scatter Xer scat'

// match if 'do' is not there between 'at' and 'par'
> /at((?!do).)*par/.test('fox,cat,dog,parrot')
< false

u and v flags

// extract all consecutive letters, use \P{L} to invert the set
> 'fox:αλεπού,eagle:αετός'.match(/\p{L}+/gu)
< ['fox', 'αλεπού', 'eagle', 'αετός']

// extract all consecutive Greek letters
> 'fox:αλεπού,eagle:αετός'.match(/\p{sc=Greek}+/gu)
< ['αλεπού', 'αετός']

// extract whole words not surrounded by punctuation marks
> 'tie. ink east;'.match(/(?<!\p{P})\b\w+\b(?!\p{P})/gu)
< ['ink']

// remove all punctuation characters except . ! and ?
> let para = '"Hi", there! How *are* you? All fine here.'
> para.replace(/[\p{P}--[.!?]]+/gv, '')
< 'Hi there! How are you? All fine here.'

Debugging and Visualization tools🔗

As your regexp gets complicated, it can get difficult to debug when you run into issues. Building your regexp step by step from scratch and testing against input strings will go a long way in correcting the problem. To aid in such a process, you could use various online regexp tools.

regex101 is a popular site to test your regexp. You'll have to first choose the flavor as JavaScript. Then you can add your regexp, input strings, choose flags and an optional replacement string. Matching portions will be highlighted and explanation is offered in separate panes. There's also a quick reference and other features like link sharing, code generator, quiz, cheatsheet, etc.

Another useful tool is jex: regulex which converts your regexp to a railroad diagram, thus providing a visual aid to understanding the pattern.

Understanding JavaScript RegExp book🔗

Visit my repo learn_js_regexp for details about the book I wrote on JavaScript regular expressions. The ebook uses plenty of examples to explain the concepts from the basics and includes exercises to test your understanding. The cheatsheet and examples presented in this post are based on the contents of this book.

Creating GUI Applications with wxPython - book review

2019-05-13T00:00:00+00:00

Photo Credit: Tranmautritam on Pexels

I've always wanted to create nice looking, useful GUI applications over the years. And I've given up most of the time as the programming seemed too difficult for me and GUI requires at least some level of design skills. I only managed to grit through one Android app for over a year as it was a dream game from school days and I had loads of free time having quit my job. At the end of it though, I had a spaghetti mess of several 1000+ lines programs and a strong aversion to Java and object oriented programming. Part of the reason is that I didn't try to learn in a formal way, just started from a tutorial closest to the game I wanted to do.

Several years later, here I am, trying my hand with GUI again. I have several small to medium scale apps in mind to implement and hopefully I'll avoid previous mistakes, especially feature creep. When I saw this tweet from Mike Driscoll, I took up the offer. I got a free book in exchange for reviewing Creating GUI Applications with wxPython. The book is currently on sale till May 15. Having to review has served as an extra incentive to read the book regularly, and so far I'm quite satisfied to have done so.

I hadn't heard of wxPython before this book. When it comes to GUI in Python, I knew about tkinter which comes by default with standard libary, Kivy, Pygame and PyQt5. This book starts with an introduction to wxPython and then dives into project-based approach. I've finished half the chapters so far, covering four project concepts:

Image viewer
Database viewer and editor
Calculator
Archiver

Rest of the chapters cover these topics:

MP3 tag editor
Image application using NASA's API
PDF merger/splitter
File search
FTP application
XML editor
Distributing your application

There are also a couple of appendix chapters.

As mentioned in book's introduction, you definitely need to be comfortable with Python classes before you start this book. The code used in the book is also available from GitHub repo, but I highly recommend to type them manually.

The project nature also means that after chapter 3, you could probably skip chapters you are not interested in. For example, I didn't pay too much attention to database chapters as I don't have much experience with databases. Each project is described and shown step by step. The projects could be run at different stages as well - playing around with the GUI at those points helps in mapping code-to-output, as well as to experiment different settings.

All in all, I would highly recommend this book for those wanting to start coding GUI applications in Python. And please do contact the author to let him know your feedback or if you have any clarifications. Happy learning :)

Python for maths

2019-03-22T00:00:00+00:00

The above image was generated using matplotlib courtesy code provided by Doing Math with Python book.

Last month, I had an opportunity to conduct beginner Python workshop for maths department students in an arts and science college. It was a great experience and I had my first taste of how Python could be applied for mathematical problems. Presented here are bunch of useful links that I gathered as resources for the students.

Documentation links🔗

Books and courses🔗

Python for beginners🔗

Automate the Boring Stuff with Python — teaches you programming concepts and then shows how to automate everyday problems
How to Think Like a Computer Scientist: Interactive Edition — inspired by Think Python
The Python Coding Book — friendly, relaxed programming book for beginners
Comprehensive Python cheatsheet
Pythontutor: Visualize code execution — also has example codes and ability to share sessions
What does debugging a program look like?
Problem solving skills

See my comprehensive Python learning resources for more.

numpy, scipy, matplotlib🔗

More resources🔗

A short and satisfying bug hunt

2019-03-06T00:00:00+00:00

The surprise🔗

So, a pleasant surprise awaited me last Sunday. As is my usual habit, I opened my github account after breakfast to see if I've got any sudden spurt in traffic. And as usual, things were normal. Except for the blue notification, which was rare. I hoped it wasn't a silly pull request and thankfully it was a new issue that was opened.

I gave the issue a cursory glance and wrongly guessed it was probably some line ending issue (user was on Windows OS). As someone who has seen plenty of bugs in previous job, I wasn't ruling out anything though. I first cloned the repo so as to try to recreate the working environment without possible interference from my local working copy. As the user had provided detailed information while opening the issue, I was able to quickly replicate it. Sure enough, I was seeing the same problem. I only wondered why it wasn't brought to my attention before. Either past users chose not to or things weren't interesting enough to reach that far in the exercises.

Creating minimal failing case🔗

As I had written the solution checker script about 2 years back, the script looked alien. Right from cloning the repo, I had to fight the urge to improve things. By the time I spotted the issue, all such fantasies were thrown out. Replaced by a todo note to someday write automated testing script to check that my script is indeed working properly for all the exercises.

To put it simply, the role of solve script is to check if the previous command executed by the user solves the current exercise question. To do so, the script gets the previous command from history and compares the output of that command and a reference solution present in the exercise directory. Sounds simple right? Yeah, I thought so too. I do remember testing few cases before I first published it and no one had submitted an issue so far. So, why was it failing now?

As mentioned before, I thought it could be some weird line ending issue. But that was effectively ruled out as it was failing for me as well on Linux. Still, I did check for funny characters with cat -A. Nope, no issues there.

$ grep -o '^[^=]*' sample.txt
a[2]
foo_bar
appx_pi
greeting
food[4]
b[0][1]
$ source ../solve -s
---------------------------------------------
Mismatch for question 1:
Expected output is:
a[2]
foo_bar
appx_pi
greeting
food[4]
b[0][1]
---------------------------------------------

Expected output was same as output for submitted solution. So, why is the script failing? I remember passing the script through shellcheck but still checked it again. No progress. So, then I started by trying to debug the most likely culprit from terminal before trying to debug the whole script. Luckily, that turned out well.

$ cat sample.txt 
a[2]='sample string'
foo_bar=4232
appx_pi=3.14
greeting="Hi  there		have a nice   day"
food[4]="dosa"
b[0][1]=42

$ # say what??
$ [[ $(eval "command grep -o '^[^=]*' sample.txt") == \
>    $(eval "command grep -o '^[^=]*' sample.txt") ]] || echo 'Not fine'
Not fine

$ # after some attempts, I tried a command that won't have
$ # any [] characters in the output
$ # Eureka!
$ [[ $(eval "command grep 'bar' sample.txt") == \
>    $(eval "command grep 'bar' sample.txt") ]] || echo 'Not fine'
$ [[ foo == foo ]] && echo 'fine'
fine
$ [[ 'a[5]' == a[5] ]] || echo 'Not fine'
Not fine
$ [[ 'a[5]' == 'a[5]' ]] && echo 'fine'
fine

Having a minimal failing case from terminal was a relief. I tried set -x but that didn't light a bulb either. Finally, somehow I thought perhaps characters in the output was causing the issue and when [] characters were not present, the comparison worked as expected.

I did think quoting could be the issue, but dismissed it at first as both sides of comparison had the same command. Then my recent experience from reviewing Command Line Fundamentals book came in handy. I remembered that if quotes aren't used on RHS of comparison operator, it is treated as glob matching instead of string matching. Phew.

TL;DR🔗

Always quote strings in bash unless you have a very good reason for not using them.

After adding double quotes around the command substitution commands, the script worked as expected. I thanked the user for opening the issue. And then informed the author for cli fundamentals book as well.

learnbyexample

Festive offers for books on Python, Linux, Regular Expressions, Vim and more!

My ebooks🔗

Other deals🔗

Connect Four game with a twist

Installation🔗

Screenshots🔗

Guide🔗

Square tic tac toe🔗

Python regular expression cheatsheet and examples

Elements that define a regular expression🔗

re module functions🔗

Regular expression examples🔗

Understanding Python re(gex)? book🔗

Better bindings for command line history search

Further Reading

Customizing pandoc to generate beautiful pdf and epub from markdown

Installation🔗

Minimal example🔗

Chapter breaks🔗

Changing settings via -V option🔗

Syntax highlighting🔗

Bullet styling🔗

PDF properties🔗

Adding table of contents🔗

Adding cover image🔗

Stylish blockquote🔗

Customizing epub🔗

Resource links🔗

Everything you need to know about sed substitution

Basic Substitution🔗

Filter and Substitute🔗

Regular Expressions🔗

Replace Specific Occurrences🔗

Executing External Commands🔗

Different Delimiters🔗

In-place Editing🔗

Manipulating Newlines🔗

Slurping Input🔗

Fixed String Substitution🔗

Programming ebooks🔗

CLI text processing with GNU awk book announcement

Release offers🔗

What's new?🔗

Videos🔗

Interactive TUI app🔗

Table of Contents🔗

Web version🔗

GitHub repo🔗

Newsletter🔗

Feedback and Errata🔗

awk idioms explained

awk command structure🔗

Regexp filtering🔗

Idiomatic use of 1🔗

Special variables🔗

Removing duplicates🔗

Rebuild $0🔗

Paragraph mode🔗

Two file processing🔗

Forcing string and numeric context🔗

Programming ebooks🔗

OS installation woes

Understanding Python re(gex)? book announcement

Release offers🔗

What's new?🔗

Videos🔗

re(gex)? playground🔗

re(gex)? exercises🔗

Table of Contents🔗

Web version🔗

GitHub repo🔗

Newsletter🔗

Feedback and Errata🔗

Coloring matched portions with GNU grep, sed and awk

GNU grep🔗

Formatting with ANSI escape sequences🔗

GNU sed🔗

GNU awk🔗

Linux CLI ebooks🔗