The Sharat'shttps://sharats.me/2023-08-20T00:00:00+05:30A Tale of Two Forwarded Headers2023-08-20T00:00:00+05:302023-08-20T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2023-08-20:/posts/tale-of-two-forwarded-headers/<p>This is the story of how I handled troubleshooting the redirect URL for OAuth2 in Appsmith, which contained the host as <code>localhost</code> instead of the actual domain name when hosted on Google Cloud Run. This is a story of how <code>Forwarded</code> and <code>X-Forwarded-*</code> headers were propagating through multiple reverse proxies …</p><p>This is the story of how I handled troubleshooting the redirect URL for OAuth2 in Appsmith, which contained the host as <code>localhost</code> instead of the actual domain name when hosted on Google Cloud Run. This is a story of how <code>Forwarded</code> and <code>X-Forwarded-*</code> headers were propagating through multiple reverse proxies and how they can be confused.</p>
<h2 id="the-problem">The Problem<a class="headerlink" href="#the-problem" title="Permanent link">¶</a></h2>
<p>Appsmith is an internal tool builder that has a React-based frontend and a Java+Spring based backend server. This backend uses the <code>spring-security</code> module’s support for OAuth2 authentication, which enables logging in to Appsmith with Google.</p>
<p>Google Cloud Run is </p>
<blockquote>
<p>[…] a managed compute platform that lets you run containers directly on top of Google’s scalable infrastructure.</p>
</blockquote>
<p>In other words, Google Cloud Run is a <em>serverless</em> abstraction, to run Docker containers.</p>
<p>When running Appsmith on Google Cloud Run and enabling Login with Google, the redirect URL used as part of the OAuth2 flow includes the host as <code>localhost</code> instead of the actual domain name. This causes the OAuth2 flow to fail due to a mismatch in the redirect URL.</p>
<h2 id="primary-behaviour">Primary Behaviour<a class="headerlink" href="#primary-behaviour" title="Permanent link">¶</a></h2>
<p>Let’s start an Appsmith container, with Google OAuth configured, and see what redirect URL gets generated in a controlled environment.</p>
<div class="hl"><pre class=content><code><span>docker<span class="w"> </span>run<span class="w"> </span>--name<span class="w"> </span>appsmith<span class="w"> </span>-p<span class="w"> </span><span class="m">8001</span>:80<span class="w"> </span>-v<span class="w"> </span>stacks:/appsmith-stacks<span class="w"> </span>-d<span class="w"> </span><span class="se">\</span>
</span><span><span class="w"> </span>-e<span class="w"> </span><span class="nv">APPSMITH_OAUTH2_GOOGLE_CLIENT_ID</span><span class="o">=</span>dummy<span class="w"> </span><span class="se">\</span>
</span><span><span class="w"> </span>-e<span class="w"> </span><span class="nv">APPSMITH_OAUTH2_GOOGLE_CLIENT_SECRET</span><span class="o">=</span>dummy<span class="w"> </span><span class="se">\</span>
</span><span><span class="w"> </span>appsmith/appsmith-ce:v1.9.29
</span></code></pre></div>
<p>We configure Google OAuth with dummy values here, since we only care about the generated redirect URL and not the complete OAuth flow.</p>
<p>Let’s wait a little while for that to start and show up working on <code>http://localhost:8001</code>. Then, let’s initiate the OAuth2 flow and see the redirect URL.</p>
<div class="hl"><pre class=content><code><span>curl<span class="w"> </span>-sSi<span class="w"> </span>http://localhost:8001/oauth2/authorization/google
</span></code></pre></div>
<p>This will print <em>all</em> the response headers. Let’s just pick the <code>redirect_uri</code> query parameter in the <code>Location</code> header (which contains the Google authorization endpoint as part of the OAuth2 flow).</p>
<div class="hl"><pre class=content><code><span>curl<span class="w"> </span>-sSi<span class="w"> </span>http://localhost:8001/oauth2/authorization/google<span class="w"> </span><span class="p">|</span><span class="w"> </span>grep<span class="w"> </span>-Eo<span class="w"> </span><span class="s1">'redirect_uri=[^&]+'</span>
</span></code></pre></div>
<p>We get the result as this:</p>
<div class="hl"><pre class=content><code><span>redirect_uri=http://localhost/login/oauth2/code/google
</span></code></pre></div>
<p>Which is not entirely accurate because it’s missing the <code>:8001</code> part, but that’s a problem for another day. For now, let’s just focus on the <code>localhost</code> part. This is the correct host here. But if we make this request with a different host:</p>
<div class="hl"><pre class=content><code><span>curl<span class="w"> </span>-sSi<span class="w"> </span>http://localhost:8001/oauth2/authorization/google<span class="w"> </span><span class="se">\</span>
</span><span><span class="w"> </span>-H<span class="w"> </span><span class="s1">'Host: one.com'</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>grep<span class="w"> </span>-Eo<span class="w"> </span><span class="s1">'redirect_uri=[^&]+'</span>
</span></code></pre></div>
<p>Here, in the <code>redirect_uri</code> query parameter, we see the URL that we expect to see, with <code>one.com</code> as the host.</p>
<div class="hl"><pre class=content><code><span>redirect_uri=http://one.com/login/oauth2/code/google
</span></code></pre></div>
<p>Similarly, if we try with <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Forwarded-Host" rel="noopener noreferrer" target="_blank"><code>X-Forwarded-Host</code></a> header, or the more standard <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Forwarded" rel="noopener noreferrer" target="_blank"><code>Forwarded</code></a> header, we always see the correct host in the <code>redirect_uri</code> query parameter.</p>
<div class="hl"><pre class=content><code><span><span class="go">> curl -sSi http://localhost:8001/oauth2/authorization/google \</span>
</span><span><span class="go"> -H 'X-Forwarded-Host: two.com' | grep -Eo 'redirect_uri=[^&]+'</span>
</span><span><span class="go">redirect_uri=http://two.com/login/oauth2/code/google</span>
</span><span>
</span><span><span class="go">> curl -sSi http://localhost:8001/oauth2/authorization/google \</span>
</span><span><span class="go"> -H 'Forwarded: host=three.com' | grep -Eo 'redirect_uri=[^&]+'</span>
</span><span><span class="go">redirect_uri=http://three.com/login/oauth2/code/google</span>
</span></code></pre></div>
<p>The Appsmith backend server seems to be handling the host detection quite well, but when it’s run on Google Cloud Run, the host is always <code>localhost</code>.</p>
<div class="hl"><pre class=content><code><span><span class="go">> curl -sSi https://appsmith-abcdefghij-uc.a.run.app/oauth2/authorization/google \</span>
</span><span><span class="go"> -H 'Host: four.com' | grep -Eo 'redirect_uri=[^&]+'</span>
</span><span><span class="go">redirect_uri=http://localhost/login/oauth2/code/google</span>
</span></code></pre></div>
<h2 id="cloud-run-the-reverse-proxy">Cloud Run, the Reverse Proxy<a class="headerlink" href="#cloud-run-the-reverse-proxy" title="Permanent link">¶</a></h2>
<p>We’ve established that if the host is shared correctly with Appsmith, it produces the correct <code>redirect_uri</code>. So something about the way Google Cloud Run is forwarding the host is not working as expected. We want to find out just what Cloud Run is sending across.</p>
<p>To get this information, let’s run an instance of <a href="https://httpbun.com" rel="noopener noreferrer" target="_blank"><code>httpbun</code></a> on Cloud Run, which can respond with all the headers it receives.</p>
<p>Here’s a sample configuration of how we can run httpbun on Cloud Run.</p>
<p class="img"><a href="https://sharats.me/static/cloudrun-httpbun.png"><img alt="httpbun on Cloud Run" src="https://sharats.me/static/cloudrun-httpbun.png"></a></p>
<p>Once this is deployed, we get a URL like <code>https://httpbun-abcdefghij-uc.a.run.app</code>. Let’s make a request to this and see what headers it reports as being part of the request.</p>
<div class="hl"><pre class=content><code><span><span class="go">> curl -sSi https://httpbun-abcdefghij-uc.a.run.app/headers</span>
</span><span><span class="go">{</span>
</span><span><span class="go"> "Accept": "*/*",</span>
</span><span><span class="go"> "Forwarded": "for=\"1.2.3.4\";proto=https",</span>
</span><span><span class="go"> "Host": "httpbun-abcdefghij-uc.a.run.app",</span>
</span><span><span class="go"> "Traceparent": "00-abcdefghijklmnopqrstuvwxyzabcdef-ghijklmnopqrstuv-01",</span>
</span><span><span class="go"> "User-Agent": "curl/7.88.1",</span>
</span><span><span class="go"> "X-Cloud-Trace-Context": "abcdefghijklmnopqrstuvwxyzabcdef/ghijklmnopqrstuvwxy;o=1",</span>
</span><span><span class="go"> "X-Forwarded-For": "1.2.3.4",</span>
</span><span><span class="go"> "X-Forwarded-Proto": "https"</span>
</span><span><span class="go">}</span>
</span></code></pre></div>
<p>Fantastic! We see that Cloud Run sends the actual host in the <code>Host</code> header, instead of <code>X-Forwarded-Host</code>, despite sending in <code>X-Forwarded-For</code> and <code>X-Forwarded-Proto</code>. This is only slightly odd, but not groundbreaking. As we’ve seen earlier, Appsmith handles this just fine.</p>
<p>But in addition to that, notice that we have a <code>Forwarded</code> header too. This contains the same information as <code>X-Fowarded-For</code> and <code>X-Forwarded-Proto</code>, and doesn’t contain a <code>host</code> field.</p>
<blockquote>
<p>Detour: The <code>Forwarded</code> header is a more standard header that holds the same (and some more) information as the <code>X-Forwarded-*</code> suite of headers, which is are a little less standard-ly defined. What’s peculiar here is that Cloud Run appears to be sending <em>both</em> <code>Forwarded</code> and <code>X-Forwarded-*</code> headers.</p>
</blockquote>
<p>We didn’t test this case with our local Appsmith. That is, we didn’t send the actual host in the <code>Host</code> header, but also include a <code>Forwarded</code> header with information about the origin protocol (and IP Address). Let’s do that now.</p>
<div class="hl"><pre class=content><code><span><span class="go">> curl -sSi http://localhost:8001/oauth2/authorization/google \</span>
</span><span><span class="go"> -H 'Host: abc.com' -H 'Forwarded: for"1.2.3.4";proto=https' | grep -Eo 'redirect_uri=[^&]+'</span>
</span><span><span class="go">redirect_uri=https://localhost/login/oauth2/code/google</span>
</span></code></pre></div>
<p>Boom! There it is. Although we’re sending the host in <code>Host</code> header, Appsmith responds with <code>localhost</code> in the host part of the <code>redirect_uri</code>. This is the same behavior we see on Cloud Run.</p>
<h2 id="the-reverse-proxy-inside-appsmith-container">The Reverse Proxy Inside Appsmith Container<a class="headerlink" href="#the-reverse-proxy-inside-appsmith-container" title="Permanent link">¶</a></h2>
<p>Inside the Appsmith container, we have an NGINX process that handles all incoming requests. If the request points to a static file, it is served immediately. If it points to a backend API call, NGINX will proxy the request over to the Appsmith backend server. This NGINX configuration file is generated by <a href="https://github.com/appsmithorg/appsmith/blob/v1.9.29/deploy/docker/templates/nginx/nginx-app-http.conf.template.sh" rel="noopener noreferrer" target="_blank">this script</a>, and you can peek into the actual configuration used by running <code>docker exec appsmith cat /etc/nginx/sites-enabled/default</code>. For the URL we’ve been <code>curl</code>-ing so far, the route that matches is this:</p>
<div class="hl"><pre class=content><code><span><span class="w"> </span><span class="k">location</span><span class="w"> </span><span class="s">/oauth2</span><span class="w"> </span><span class="p">{</span>
</span><span><span class="w"> </span><span class="kn">proxy_pass</span><span class="w"> </span><span class="s">http://localhost:8080</span><span class="p">;</span>
</span><span><span class="w"> </span><span class="p">}</span>
</span></code></pre></div>
<p>Since this <code>location</code> block doesn’t have <em>any</em> <code>proxy_set_header</code> directives, the ones in the parent context will apply. We can see these as:</p>
<div class="hl"><pre class=content><code><span><span class="w"> </span><span class="k">proxy_set_header</span><span class="w"> </span><span class="s">X-Forwarded-Proto</span><span class="w"> </span><span class="nv">$origin_scheme</span><span class="p">;</span>
</span><span><span class="w"> </span><span class="k">proxy_set_header</span><span class="w"> </span><span class="s">X-Forwarded-Host</span><span class="w"> </span><span class="nv">$origin_host</span><span class="p">;</span>
</span></code></pre></div>
<p>The <code>$origin_scheme</code> and <code>$origin_host</code> are defined at the top of the configuration file, like this:</p>
<div class="hl"><pre class=content><code><span><span class="k">map</span><span class="w"> </span><span class="nv">$http_x_forwarded_proto</span><span class="w"> </span><span class="nv">$origin_scheme</span><span class="w"> </span><span class="p">{</span>
</span><span><span class="w"> </span><span class="kn">default</span><span class="w"> </span><span class="nv">$http_x_forwarded_proto</span><span class="p">;</span>
</span><span><span class="w"> </span><span class="kn">''</span><span class="w"> </span><span class="nv">$scheme</span><span class="p">;</span>
</span><span><span class="p">}</span>
</span><span>
</span><span><span class="k">map</span><span class="w"> </span><span class="nv">$http_x_forwarded_host</span><span class="w"> </span><span class="nv">$origin_host</span><span class="w"> </span><span class="p">{</span>
</span><span><span class="w"> </span><span class="kn">default</span><span class="w"> </span><span class="nv">$http_x_forwarded_host</span><span class="p">;</span>
</span><span><span class="w"> </span><span class="kn">''</span><span class="w"> </span><span class="nv">$host</span><span class="p">;</span>
</span><span><span class="p">}</span>
</span></code></pre></div>
<p>What this is essentially doing is setting up so that if the incoming request has an <code>X-Forwarded-Proto</code> header, the <code>$origin_scheme</code> is set to that header’s value. If that header is <em>not</em> present in the request, <code>$origin_scheme</code> is set to <code>$scheme</code>. This is an NGINX variable set to the current request’s protocol. Similarly, <code>$origin_host</code> either takes the value of <code>X-Forwarded-Host</code> header if present, or the current request’s host (which is <em>usually</em> the <code>Host</code> header of the request).</p>
<p>This means that once the request goes from this NGINX to Appsmith backend server, <code>Host</code> becomes <code>localhost:8080</code>, <code>X-Forwarded-Host</code> is set to <code>appsmith-abcdefghij-uc.a.run.app</code>, and the others, <code>X-Forwarded-Proto</code>, <code>X-Forwarded-For</code> and even the <code>Forwarded</code> header, are passed along as is.</p>
<p>This is the problem.</p>
<p>Since the <code>Forwarded</code> header is the more modern standard, it’s value usually takes precedence. The fact that the request has a <code>Forwarded</code> header, unfortunately means that all the other <code>X-Forwarded-*</code> headers will be ignored by the Appsmith server.</p>
<p>This means the <code>X-Forwarded-Host</code> header is completely ignored, and the server instead looks for a <code>host=</code> field in the <code>Forwarded</code> header, which is missing, so it thinks the host it receives in the <code>Host</code> header, <code>localhost:8080</code>, is the actual host, and uses that to construct the <code>redirect_uri</code>.</p>
<p>We can simulate this theory by sending a request to the Appsmith backend server directly instead of going through the NGINX proxy. We can do this by using the <code>docker exec</code> command, like this:</p>
<div class="hl"><pre class=content><code><span><span class="go">> docker exec appsmith curl -sSi localhost:8080/oauth2/authorization/google \</span>
</span><span><span class="go"> -H 'Forwarded: for="1.2.3.4";proto=https' \</span>
</span><span><span class="go"> -H 'X-Forwarded-Host: abc.com' \</span>
</span><span><span class="go"> | grep -Eo 'redirect_uri=[^&]+'</span>
</span><span><span class="go">redirect_uri=https://localhost/login/oauth2/code/google</span>
</span></code></pre></div>
<p>This produces <code>localhost</code> in the <code>redirect_uri</code>, just like we saw earlier, instead of <code>abc.com</code>. If we remove the <code>Forwarded</code> header, or add <code>host=</code> field in it, it works just fine.</p>
<div class="hl"><pre class=content><code><span><span class="go">> docker exec appsmith curl -sSi localhost:8080/oauth2/authorization/google \</span>
</span><span><span class="go"> -H 'X-Forwarded-Host: abc.com' \</span>
</span><span><span class="go"> | grep -Eo 'redirect_uri=[^&]+'</span>
</span><span><span class="go">redirect_uri=https://abc.com/login/oauth2/code/google</span>
</span><span>
</span><span><span class="go">> docker exec appsmith curl -sSi localhost:8080/oauth2/authorization/google \</span>
</span><span><span class="go"> -H 'Forwarded: for="1.2.3.4";proto=https, host=abc.com' \</span>
</span><span><span class="go"> -H 'X-Forwarded-Host: abc.com' \</span>
</span><span><span class="go"> | grep -Eo 'redirect_uri=[^&]+'</span>
</span><span><span class="go">redirect_uri=https://abc.com/login/oauth2/code/google</span>
</span></code></pre></div>
<h2 id="the-solution">The Solution<a class="headerlink" href="#the-solution" title="Permanent link">¶</a></h2>
<p>In the NGINX, we add/set the <code>X-Forwarded-Host</code> header, at all times, which is the right thing to do. But if the incoming request has a <code>Forwarded</code> header, it takes precedence and the <code>X-Forwarded-Host</code> header is ignored. This is the problem.</p>
<p>So we get NGINX to <em>also</em> add the <code>host=</code> field, if a <code>Forwarded</code> header exists. We do this in <a href="https://github.com/appsmithorg/appsmith/pull/25827/files" rel="noopener noreferrer" target="_blank">this PR</a>.</p>
<p>Essentially, define a <code>$final_forwarded</code>, like this:</p>
<div class="hl"><pre class=content><code><span><span class="k">map</span><span class="w"> </span><span class="nv">$http_forwarded</span><span class="w"> </span><span class="nv">$final_forwarded</span><span class="w"> </span><span class="p">{</span>
</span><span><span class="w"> </span><span class="kn">default</span><span class="w"> </span><span class="s">'</span><span class="nv">$http_forwarded,</span><span class="w"> </span><span class="s">host=</span><span class="nv">$host</span><span class="p">;</span><span class="kn">proto=$scheme'</span><span class="p">;</span>
</span><span><span class="w"> </span><span class="kn">''</span><span class="w"> </span><span class="s">''</span><span class="p">;</span>
</span><span><span class="p">}</span>
</span></code></pre></div>
<p>In the <code>http</code> block, we set the <code>Forwarded</code> header as follows:</p>
<div class="hl"><pre class=content><code><span><span class="w"> </span><span class="k">proxy_set_header</span><span class="w"> </span><span class="s">Forwarded</span><span class="w"> </span><span class="nv">$final_forwarded</span><span class="p">;</span>
</span></code></pre></div>
<p>This way, if there’s no incoming <code>Forwarded</code> header, we don’t send it to the backend server either. But if it exists, we add the <code>host=</code> field (and a <code>proto=</code> field for good measure) to it, and send it to the backend server.</p>
<h2 id="conclusion">Conclusion<a class="headerlink" href="#conclusion" title="Permanent link">¶</a></h2>
<p>The confusion between <code>Forwarded</code> and <code>X-Forwarded-*</code> suite of headers, and which takes precedence when <em>both</em> are set, turned out to be the underlying problem. The NGINX we use inside Appsmith, was only ever tuned to work with <code>X-Forwarded-*</code> suite of headers. Additionally, since Google Cloud Run is so opaque, in the sense that we can’t even get shell access into the running container, using tools like Httpbun can be very helpful in figuring out what details the request actually contains.</p>Running Docker containers in network isolation with proxied traffic2023-04-16T00:00:00+05:302023-04-16T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2023-04-16:/posts/docker-with-proxy/<p>Several network configurations, especially in large companies and universities, have a proxy configured for all outgoing traffic. Any network traffic that tries to go out <em>bypassing</em> this proxy, will be blocked. For a self-hosted web application, the server will also need to make any and all outgoing connections via this …</p><p>Several network configurations, especially in large companies and universities, have a proxy configured for all outgoing traffic. Any network traffic that tries to go out <em>bypassing</em> this proxy, will be blocked. For a self-hosted web application, the server will also need to make any and all outgoing connections via this proxy.</p>
<p>Now, several applications, web application servers included, support the <code>HTTP_PROXY</code> and <code>HTTPS_PROXY</code> environment variables to configure such a proxy. But if we don’t have a network that blocks non-proxy traffic, how do you do we test this? How can we ensure, that when a proxy is configured, all outgoing requests are only ever made through the proxy?</p>
<p>This article is my attempt at answering this.</p>
<div class="toc"><span class="toctitle">Table of Contents</span><ul>
<li><a href="#docker-networks">Docker Networks</a></li>
<li><a href="#sandbox">Sandbox</a></li>
<li><a href="#proxying-https-requests">Proxying HTTPS Requests</a></li>
<li><a href="#dns-resolution">DNS Resolution</a></li>
<li><a href="#connecting-from-host">Connecting from Host</a></li>
<li><a href="#testing-appsmith">Testing Appsmith</a></li>
<li><a href="#further-explorations">Further Explorations</a></li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
</div>
<h2 id="docker-networks">Docker Networks<a class="headerlink" href="#docker-networks" title="Permanent link">¶</a></h2>
<p>We’ll be using Docker’s networking features. It provides a simple set of primitives to solve what we need here.</p>
<p>By default, Docker sets up a bridge network for us that allows connectivity to external endpoints. With explicit configuration, we can also have an <em>internal</em> network, where connections are only allowed to other containers that are also connected to this internal network.</p>
<p>The Docker’s official documentation about <a href="https://docs.docker.com/compose/networking/" rel="noopener noreferrer" target="_blank">Networking in Docker Compose</a> talks more in detail about this.</p>
<h2 id="sandbox">Sandbox<a class="headerlink" href="#sandbox" title="Permanent link">¶</a></h2>
<p>We need a sandbox environment where there’s a proxy and a subject application. We want to ensure that outgoing requests made from the subject application always fail unless they go via the proxy.</p>
<p>Let’s start with two containers, in a <code>docker-compose.yml</code> configuration.</p>
<ul>
<li>The <code>subject</code> container, which is expected to make all outgoing requests via the proxy only.</li>
<li>The <code>proxy</code> container, which runs an HTTP proxy.</li>
</ul>
<p>For the <code>subject</code> container, we’ll use an ordinary, friendly, memorable, vanilla Ubuntu container, with the command set to <code>sleep infinity</code>. This makes the container stay running so that we can get in and play around. Without this, the container would start, do nothing, and just exit. Not very useful.</p>
<p>For the <code>proxy</code> container, we’ll use <code>mitmproxy</code>. Rpecifically, the web interface version called <code>mitmweb</code>. This is an excellent proxy application, best used for intercepting requests during development. If you haven’t been spoilt by it, I encourage you to check it out.</p>
<p>So, this is our initial version of the sandbox:</p>
<div class="hl"><pre class=content><code><span><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s">"3"</span>
</span><span>
</span><span>
</span><span><span class="nt">services</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="nt">subject</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">ubuntu</span>
</span><span><span class="w"> </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">sleep infinity</span>
</span><span>
</span><span><span class="w"> </span><span class="nt">proxy</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">mitmproxy/mitmproxy</span>
</span><span><span class="w"> </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">mitmweb --web-host 0.0.0.0</span>
</span><span><span class="w"> </span><span class="nt">ports</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="s">"8081:8081"</span>
</span></code></pre></div>
<p>Save this as a <code>docker-compose.yml</code> and do a <code>docker-compose up -d</code>. Once the two containers are running, open <a href="http://localhost:8081" rel="noopener noreferrer" target="_blank">localhost:8081</a>. This is where we’ll see all the HTTP requests flowing through our proxy.</p>
<p>Let’s get inside the <code>subject</code> container and make some requests. Start a shell with <code>docker-compose exec subject bash</code>. This will start a shell session running <em>inside</em> the <code>subject</code> container. Use the following command to install <code>curl</code> to play with:</p>
<div class="hl"><pre class=content><code><span>apt<span class="w"> </span>update
</span><span>apt<span class="w"> </span>install<span class="w"> </span>--yes<span class="w"> </span>curl
</span><span>curl<span class="w"> </span>httpbun.com/get
</span></code></pre></div>
<p>This will make an external request, and print the response in the Terminal, but this request won’t show up in <code>mitmproxy</code>’s UI. For that, let’s do:</p>
<div class="hl"><pre class=content><code><span><span class="nv">http_proxy</span><span class="o">=</span>http://proxy:8080<span class="w"> </span>curl<span class="w"> </span>httpbun.com/get
</span></code></pre></div>
<p>This will show the response in the Terminal, as well as in <code>mitmproxy</code>’s UI.</p>
<p class="img"><a href="https://sharats.me/static/mitmproxy-sample.png"><img alt="Sample request on mitmproxy's UI" src="https://sharats.me/static/mitmproxy-sample.png"></a></p>
<p>Let’s step this up. We’ll now block direct Internet access to the <code>subject</code> container and only allow connecting via the proxy. Consider the following <code>docker-compose.yml</code> file:</p>
<div class="hl"><pre class=content><code><span><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s">"3"</span>
</span><span>
</span><span>
</span><span><span class="nt">services</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="nt">subject</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">ubuntu</span>
</span><span><span class="w"> </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">sleep infinity</span>
</span><span><span class="hll"><span class="w"> </span><span class="nt">networks</span><span class="p">:</span>
</span></span><span><span class="hll"><span class="w"> </span><span class="nt">intnet</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">{}</span>
</span></span><span>
</span><span><span class="w"> </span><span class="nt">proxy</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">mitmproxy/mitmproxy</span>
</span><span><span class="w"> </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">mitmweb --web-host 0.0.0.0</span>
</span><span><span class="w"> </span><span class="nt">ports</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="s">"8081:8081"</span>
</span><span><span class="hll"><span class="w"> </span><span class="nt">networks</span><span class="p">:</span>
</span></span><span><span class="hll"><span class="w"> </span><span class="nt">intnet</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">{}</span>
</span></span><span><span class="hll"><span class="w"> </span><span class="nt">extnet</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">{}</span>
</span></span><span>
</span><span>
</span><span><span class="hll"><span class="nt">networks</span><span class="p">:</span>
</span></span><span><span class="hll"><span class="w"> </span><span class="nt">intnet</span><span class="p">:</span>
</span></span><span><span class="hll"><span class="w"> </span><span class="nt">internal</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
</span></span><span><span class="hll"><span class="w"> </span><span class="nt">extnet</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">{}</span>
</span></span></code></pre></div>
<p>This is the same as the previous one, except for <code>networks</code> configurations. We define two networks, an internal network named <code>intnet</code> and an external network named <code>extnet</code>. The <code>subject</code> container is only connected to <code>intnet</code>, so it can only connect to other containers that are also connected to <code>intnet</code>. The <code>proxy</code> container is connected to both <code>intnet</code> and <code>extnet</code>, so it can both access other containers in the <code>intnet</code> as well as access the wider Internet.</p>
<p>With this setup, we expect direct network connections from <code>subject</code> to the Internet to fail, unless they go via the <code>proxy</code> container.</p>
<p>Let’s do a <code>docker-compose up -d</code> with this file, open a shell with <code>docker-compose exec subject bash</code>, and try to install <code>curl</code> again. But notice that when we run <code>apt update</code>, it doesn’t work, since this too requires the Internet and we’ve blocked it. We’ll use this as proof that blocking Internet is working!</p>
<p class="img"><a href="https://sharats.me/static/mitmproxy-apt-requests.png"><img alt="Requests from apt in mitmproxy UI" src="https://sharats.me/static/mitmproxy-apt-requests.png"></a></p>
<p>Instead of <code>apt update</code>, issue the command <code>http_proxy=http://proxy:8080 apt update</code>. This should make all requests via that proxy, and should even show up in <code>mitmproxy</code>’s UI. Make sure you refresh the page, since the <code>mitmproxy</code> container has been recreated. Effectively, we do:</p>
<div class="hl"><pre class=content><code><span>docker-compose<span class="w"> </span><span class="nb">exec</span><span class="w"> </span>subject<span class="w"> </span>bash
</span><span><span class="nv">http_proxy</span><span class="o">=</span>http://proxy:8080<span class="w"> </span>apt<span class="w"> </span>update
</span><span><span class="nv">http_proxy</span><span class="o">=</span>http://proxy:8080<span class="w"> </span>apt<span class="w"> </span>install<span class="w"> </span>--yes<span class="w"> </span>curl
</span></code></pre></div>
<p>Notice that these commands will show a bunch of requests in <code>mitmproxy</code>’s UI made to the Ubuntu package archives. Now, we can try out our test with <code>curl</code>:</p>
<div class="hl"><pre class=content><code><span>curl<span class="w"> </span>httpbun.com/get
</span></code></pre></div>
<p>This will eventually timeout. The <code>subject</code> container doesn’t have access to the Internet, so this can’t run. Let’s try:</p>
<div class="hl"><pre class=content><code><span><span class="nv">http_proxy</span><span class="o">=</span>http://proxy:8080<span class="w"> </span>curl<span class="w"> </span>httpbun.com/get
</span></code></pre></div>
<p>This should work, and the request should show up in <code>mitmproxy</code>’s UI.</p>
<h2 id="proxying-https-requests">Proxying HTTPS Requests<a class="headerlink" href="#proxying-https-requests" title="Permanent link">¶</a></h2>
<p>The setup we have so far works with proxying HTTP requests, but not for HTTPS requests. The whole point of HTTPS over HTTP is to make man-in-the-middle interventions impossible in a request. But that’s exactly what a proxy does!</p>
<p>To solve this, we’ll install and setup <code>mitmproxy</code>’s CA into the <code>subject</code> container. This will ensure that even if mitmproxy intervenes in HTTPS requests, our <code>subject</code> container will gladly accept and mark such requests as verified. This is documented on <a href="https://docs.mitmproxy.org/stable/concepts-certificates/" rel="noopener noreferrer" target="_blank"><code>mitmproxy</code>’s documentation</a>.</p>
<p>The first time mitmproxy starts, it generates a new random CA certificate. This is the certificate is what we want our <code>subject</code> container to trust. So we’ll use a Docker volume, to share this cert with the <code>subject</code> container.</p>
<div class="hl"><input type=checkbox id=co-14><label for=co-14><span class='btn show-full-code-btn'>Show remaining 7 lines</span></label><pre class=content><code><span><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s">"3"</span>
</span><span>
</span><span>
</span><span><span class="nt">services</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="nt">subject</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">ubuntu</span>
</span><span><span class="w"> </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">sleep infinity</span>
</span><span><span class="w"> </span><span class="nt">networks</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="nt">intnet</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">{}</span>
</span><span><span class="hll"><span class="w"> </span><span class="nt">volumes</span><span class="p">:</span>
</span></span><span><span class="hll"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">./certs:/certs:ro</span>
</span></span><span>
</span><span><span class="w"> </span><span class="nt">proxy</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">mitmproxy/mitmproxy</span>
</span><span><span class="w"> </span><span class="nt">ports</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">[</span><span class="s">"8081:8081"</span><span class="p p-Indicator">]</span>
</span><span><span class="w"> </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">mitmweb --web-host 0.0.0.0</span>
</span><span><span class="w"> </span><span class="nt">networks</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="nt">intnet</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">{}</span>
</span><span><span class="w"> </span><span class="nt">extnet</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">{}</span>
</span><span><span class="hll"><span class="w"> </span><span class="nt">volumes</span><span class="p">:</span>
</span></span><span class=collapse><span class="hll"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">./certs:/home/mitmproxy/.mitmproxy</span>
</span></span><span class=collapse>
</span><span class=collapse>
</span><span class=collapse><span class="nt">networks</span><span class="p">:</span>
</span><span class=collapse><span class="w"> </span><span class="nt">intnet</span><span class="p">:</span>
</span><span class=collapse><span class="w"> </span><span class="nt">internal</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
</span><span class=collapse><span class="w"> </span><span class="nt">extnet</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">{}</span>
</span></code></pre></div>
<p>Here, we define a volume on each container at the host path <code>./certs</code> that’ll hold the contents of the <code>/home/mitmproxy/.mitmproxy</code> folder inside the <code>proxy</code> container. This is the path where mitmproxy will save the generated CA root certificate.</p>
<p>We also give the <code>subject</code> container access to this volume, at the <code>/certs</code> location inside the <code>container</code>. Notice the <code>:ro</code> suffix here, which means read-only access. We don’t expect the <code>subject</code> container to write anything to this volume, just read the CA certificate.</p>
<p>Let’s start the containers again with a <code>docker-compose up -d</code> and then run our tests again:</p>
<div class="hl"><pre class=content><code><span><span class="nv">http_proxy</span><span class="o">=</span>http://proxy:8080<span class="w"> </span>apt<span class="w"> </span>update
</span><span><span class="nv">http_proxy</span><span class="o">=</span>http://proxy:8080<span class="w"> </span>apt<span class="w"> </span>install<span class="w"> </span>--yes<span class="w"> </span>curl
</span><span><span class="nv">http_proxy</span><span class="o">=</span>http://proxy:8080<span class="w"> </span>curl<span class="w"> </span>httpbun.com/get
</span><span><span class="nv">https_proxy</span><span class="o">=</span>http://proxy:8080<span class="w"> </span>curl<span class="w"> </span>https://httpbun.com/get
</span></code></pre></div>
<p>But notice that the last command hitting the HTTPS API fails. This is because the <code>subject</code> container doesn’t trust the <code>mitmproxy</code>’s CA certificate. We’ll see something like this in the output:</p>
<div class="hl"><pre class=content><code><span>curl: (60) SSL certificate problem: unable to get local issuer certificate
</span><span>More details here: https://curl.se/docs/sslcerts.html
</span><span>
</span><span>curl failed to verify the legitimacy of the server and therefore could not
</span><span>establish a secure connection to it. To learn more about this situation and
</span><span>how to fix it, please visit the web page mentioned above.
</span></code></pre></div>
<p>Now the issue is just that the SSL <em>verification</em> has failed. Since the verification failed, <code>curl</code> denies continuing with the request. We can tell <code>curl</code> to ignore the verification failure by using the <code>--insecure</code> flag like this:</p>
<div class="hl"><pre class=content><code><span><span class="nv">https_proxy</span><span class="o">=</span>http://proxy:8080<span class="w"> </span>curl<span class="w"> </span>--insecure<span class="w"> </span>https://httpbun.com/get
</span></code></pre></div>
<p>But that’s not what we want. We want to tell <code>curl</code> to trust the <code>mitmproxy</code>’s CA certificate. Like this:</p>
<div class="hl"><pre class=content><code><span>https_proxy=http://proxy:8080 curl --cacert /certs/mitmproxy-ca.pem https://httpbun.com/get
</span></code></pre></div>
<p>This should show up as an HTTPS request in mitmproxy with the ability to view full details of the request and response. Try out the same/similar <code>curl</code> commands <em>without</em> the proxy, and notice that those requests fail.</p>
<h2 id="dns-resolution">DNS Resolution<a class="headerlink" href="#dns-resolution" title="Permanent link">¶</a></h2>
<p>When an HTTP proxy is configured, DNS resolution is done by the proxy. This is because to make the request, it is the proxy that connects to the endpoint server. So it needs to know the IP address of the host. As long as the <code>subject</code> container is only making HTTP(s) requests, this is fine. But if we need it to make an explicit DNS query, we see that it fails:</p>
<div class="hl"><pre class=content><code><span><span class="nv">http_proxy</span><span class="o">=</span>http://proxy:8080<span class="w"> </span>apt<span class="w"> </span>install<span class="w"> </span>--yes<span class="w"> </span>dnsutils
</span><span>nslookup<span class="w"> </span>httpbun.com
</span></code></pre></div>
<p>This will fail because direct DNS resolution (as opposed to with a proxy, or with DNS-over-HTTPS) required access to the external network, which the <code>subject</code> container doesn’t have. We can solve this the same way we solved for HTTP requests, with a proxy.</p>
<p>Let’s add the following DNS proxy service to our <code>docker-compose.yml</code>:</p>
<div class="hl"><pre class=content><code><span><span class="w"> </span><span class="nt">dns</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">mitmproxy/mitmproxy</span>
</span><span><span class="w"> </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">mitmdump --mode dns</span>
</span><span><span class="w"> </span><span class="nt">networks</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="nt">intnet</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">{}</span>
</span><span><span class="w"> </span><span class="nt">extnet</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">{}</span>
</span></code></pre></div>
<p>This is again an <code>mitmproxy</code> container, that runs in DNS mode. And it highlights how awesome <code>mitmproxy</code> is! This brings us a DNS <em>proxy</em>, that we can use to resolve DNS queries.</p>
<p>Now we’ll instruct the <code>subject</code> container to use this <code>dns</code> container, for DNS queries. This is handled by the <code>resolv.conf</code> inside the <code>subject</code> container. Let’s inspect its contents:</p>
<div class="hl"><pre class=content><code><span>docker-compose<span class="w"> </span><span class="nb">exec</span><span class="w"> </span>subject<span class="w"> </span>cat<span class="w"> </span>/etc/resolv.conf
</span></code></pre></div>
<p>We should see something like this:</p>
<div class="hl"><pre class=content><code><span>nameserver 127.0.0.11
</span><span>options ndots:0
</span></code></pre></div>
<p>The IP Address next to <code>nameserver</code> is what will be used for DNS resolutions. We need this to be the IP Address of the <code>dns</code> container, as on the <code>intnet</code> network. The <code>docker inspect</code> command can help us find this IP Address. In the output of <code>docker inspect $(docker-compose ps -q dns)</code>, under <code>NetworkSettings.Networks</code>, you’ll find the IP Address of the <code>dns</code> container, on the <code>intnet</code> network. We want this IP Address to be added to the <code>resolv.conf</code> of the <code>subject</code> container.</p>
<p>We can use the below commands to do this:</p>
<div class="hl"><pre class=content><code><span>docker-compose<span class="w"> </span>up<span class="w"> </span>-d<span class="w"> </span>dns
</span><span>docker-compose<span class="w"> </span><span class="nb">exec</span><span class="w"> </span>subject<span class="w"> </span>sh<span class="w"> </span>-c<span class="w"> </span><span class="s2">"echo nameserver </span><span class="k">$(</span>
</span><span><span class="w"> </span>docker<span class="w"> </span>inspect<span class="w"> </span><span class="s2">"</span><span class="k">$(</span>docker-compose<span class="w"> </span>ps<span class="w"> </span>-q<span class="w"> </span>dns<span class="k">)</span><span class="s2">"</span><span class="w"> </span>-f<span class="w"> </span><span class="s1">$'{{range $k, $v := .NetworkSettings.Networks}}{{$k}}:{{$v.IPAddress}}\n{{end}}'</span><span class="w"> </span><span class="se">\</span>
</span><span><span class="w"> </span><span class="p">|</span><span class="w"> </span>awk<span class="w"> </span>-F:<span class="w"> </span><span class="s1">'/_intnet:/ {print $2}'</span>
</span><span><span class="k">)</span><span class="s2"> >> /etc/resolv.conf"</span>
</span></code></pre></div>
<p>Note that we <em>add</em> another <code>nameserver</code> line with this IP Address instead of replacing the existing one. The reason for this is that the existing one is still useful to resolve internal hostnames, like <code>proxy</code>. Now let’s try the DNS query again:</p>
<div class="hl"><pre class=content><code><span>nslookup<span class="w"> </span>httpbun.com
</span></code></pre></div>
<p>We should see the resolved IP Address show up. You can also try to resolve other hostnames, even internal ones like <code>proxy</code>, and see that it responds with that container’s <em>internal</em> IP Address.</p>
<h2 id="connecting-from-host">Connecting from Host<a class="headerlink" href="#connecting-from-host" title="Permanent link">¶</a></h2>
<p>So far, our <code>subject</code> container has only been <code>sleep</code>ing (pun shamelessly intended). But usually, we’d want it to host a website or an app, that’s available on HTTP from outside the container, and outside the <code>intnet</code> network. Let’s set a small website in the <code>subject</code> container.</p>
<p>First, let’s create a nice <code>index.html</code> for our website:</p>
<div class="hl"><pre class=content><code><span>cat<span class="w"> </span><span class="s"><<EOF > index.html</span>
</span><span><span class="s"><h1>My awesome website!</h1></span>
</span><span><span class="s">EOF</span>
</span></code></pre></div>
<p>Second, let’s change the <code>subject</code> container to run a Python content webserver on port <code>80</code>:</p>
<div class="hl"><pre class=content><code><span><span class="w"> </span><span class="nt">subject</span><span class="p">:</span>
</span><span><span class="hll"><span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">python:3-alpine</span>
</span></span><span><span class="hll"><span class="w"> </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">python -m http.server -d /www 80</span>
</span></span><span><span class="hll"><span class="w"> </span><span class="nt">ports</span><span class="p">:</span>
</span></span><span><span class="hll"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="s">"8090:80"</span>
</span></span><span><span class="w"> </span><span class="nt">networks</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="nt">intnet</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">{}</span>
</span><span><span class="w"> </span><span class="nt">volumes</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">./certs:/certs:ro</span>
</span><span><span class="hll"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">.:/www</span>
</span></span></code></pre></div>
<p>To verify that it’s working, let’s <code>curl localhost</code> in the <code>subject</code> container, and we should see the “My awesome website!” show up.</p>
<p>We’re also exposing this on port 8090 on the host, so if we open <a href="http://localhost:8090" rel="noopener noreferrer" target="_blank">http://localhost:8090</a> in the browser on the host system, we should see this “My awesome webapp!” page, right?</p>
<p>But, no, it doesn’t work. The reason is that the <code>subject</code> container is only connected to the <code>intnet</code> network, which is inaccessible from outside the network-sandbox that Docker has created.</p>
<p>Remember how we used the <code>proxy</code> container to let <code>subject</code> access Internet resources? We’ll do the <em>reverse</em> here. We’ll define a <em>reverse-proxy</em>, that connects to both <code>intnet</code> and <code>extnet</code>, and will forward all incoming requests to <code>subject</code>. We can use <code>mitmproxy</code> here as well because it can act as a reverse proxy too (yes, mind blown).</p>
<div class="hl"><pre class=content><code><span><span class="w"> </span><span class="nt">rproxy</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">mitmproxy/mitmproxy</span>
</span><span><span class="w"> </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">mitmdump --mode reverse:http://subject --listen-port 80</span>
</span><span><span class="w"> </span><span class="nt">ports</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="s">"8091:80"</span>
</span><span><span class="w"> </span><span class="nt">networks</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="nt">intnet</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">{}</span>
</span><span><span class="w"> </span><span class="nt">extnet</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">{}</span>
</span></code></pre></div>
<p>Although, if you prefer to use a real reverse-proxy, like NGINX, this is the kind of configuration we’ll want:</p>
<div class="hl"><pre class=content><code><span><span class="k">worker_processes</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
</span><span><span class="k">error_log</span><span class="w"> </span><span class="s">/dev/stderr</span><span class="w"> </span><span class="s">info</span><span class="p">;</span>
</span><span>
</span><span><span class="k">events</span><span class="w"> </span><span class="p">{</span>
</span><span><span class="w"> </span><span class="kn">worker_connections</span><span class="w"> </span><span class="mi">1024</span><span class="p">;</span>
</span><span><span class="p">}</span>
</span><span>
</span><span><span class="k">stream</span><span class="w"> </span><span class="p">{</span>
</span><span><span class="w"> </span><span class="kn">server</span><span class="w"> </span><span class="p">{</span>
</span><span><span class="w"> </span><span class="kn">listen</span><span class="w"> </span><span class="mi">80</span><span class="p">;</span>
</span><span><span class="w"> </span><span class="kn">proxy_pass</span><span class="w"> </span><span class="s">http://subject</span><span class="p">;</span>
</span><span><span class="w"> </span><span class="p">}</span>
</span><span><span class="p">}</span>
</span></code></pre></div>
<p>Point is to just listen on port 80 and forward all HTTP requests to the <code>subject</code> container’s webapp.</p>
<p>Let’s bring it up with <code>docker-compose up -d rproxy</code>.</p>
<p>Now, if we open <a href="http://localhost:8091" rel="noopener noreferrer" target="_blank">http://localhost:8091</a> in the browser on the host system, we should see the response from our little piece of awesome.</p>
<h2 id="testing-appsmith">Testing Appsmith<a class="headerlink" href="#testing-appsmith" title="Permanent link">¶</a></h2>
<p>Appsmith is a low-code internal tool builder. It’s a webapp that lets you build internal tools, without writing code. It’s a great tool for building internal tools, but it’s also a great tool to test internal tools.</p>
<p>We wanted to test Appsmith and make sure it works well with a proxy. We also want to make sure that when a proxy is configured, it doesn’t make any requests trying to bypass it.</p>
<p>To do that, we started with the following <code>docker-compose.yml</code> file:</p>
<div class="hl"><input type=checkbox id=co-15><label for=co-15><span class='btn show-full-code-btn'>Show remaining 27 lines</span></label><pre class=content><code><span><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s">"3"</span>
</span><span>
</span><span>
</span><span><span class="nt">services</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="nt">appsmith</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">appsmith/appsmith-ce</span>
</span><span><span class="w"> </span><span class="nt">environment</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="nt">HTTP_PROXY</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">http://proxy:8080</span>
</span><span><span class="w"> </span><span class="nt">HTTPS_PROXY</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">http://proxy:8080</span>
</span><span><span class="w"> </span><span class="nt">networks</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="nt">intnet</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">{}</span>
</span><span><span class="w"> </span><span class="nt">volumes</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">./stacks:/appsmith-stacks</span>
</span><span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">./resolv.conf:/etc/resolv.conf:ro</span>
</span><span>
</span><span><span class="w"> </span><span class="nt">proxy</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">mitmproxy/mitmproxy</span>
</span><span><span class="w"> </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">mitmweb --web-host 0.0.0.0</span>
</span><span><span class="w"> </span><span class="nt">ports</span><span class="p">:</span>
</span><span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="s">"8081:8081"</span>
</span><span class=collapse><span class="w"> </span><span class="nt">networks</span><span class="p">:</span>
</span><span class=collapse><span class="w"> </span><span class="nt">intnet</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">{}</span>
</span><span class=collapse><span class="w"> </span><span class="nt">extnet</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">{}</span>
</span><span class=collapse><span class="w"> </span><span class="nt">volumes</span><span class="p">:</span>
</span><span class=collapse><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">./certs:/home/mitmproxy/.mitmproxy</span>
</span><span class=collapse>
</span><span class=collapse><span class="w"> </span><span class="nt">rproxy</span><span class="p">:</span>
</span><span class=collapse><span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">mitmproxy/mitmproxy</span>
</span><span class=collapse><span class="w"> </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">mitmdump --mode reverse:http://subject --listen-port 80</span>
</span><span class=collapse><span class="w"> </span><span class="nt">ports</span><span class="p">:</span>
</span><span class=collapse><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="s">"8091:80"</span>
</span><span class=collapse><span class="w"> </span><span class="nt">networks</span><span class="p">:</span>
</span><span class=collapse><span class="w"> </span><span class="nt">intnet</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">{}</span>
</span><span class=collapse><span class="w"> </span><span class="nt">extnet</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">{}</span>
</span><span class=collapse>
</span><span class=collapse><span class="w"> </span><span class="nt">dns</span><span class="p">:</span>
</span><span class=collapse><span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">mitmproxy/mitmproxy</span>
</span><span class=collapse><span class="w"> </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">mitmdump --mode dns</span>
</span><span class=collapse><span class="w"> </span><span class="nt">networks</span><span class="p">:</span>
</span><span class=collapse><span class="w"> </span><span class="nt">intnet</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">{}</span>
</span><span class=collapse><span class="w"> </span><span class="nt">extnet</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">{}</span>
</span><span class=collapse>
</span><span class=collapse>
</span><span class=collapse><span class="nt">networks</span><span class="p">:</span>
</span><span class=collapse><span class="w"> </span><span class="nt">intnet</span><span class="p">:</span>
</span><span class=collapse><span class="w"> </span><span class="nt">internal</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
</span><span class=collapse><span class="w"> </span><span class="nt">extnet</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">{}</span>
</span></code></pre></div>
<p>A few things are happening here:</p>
<ol>
<li>We start Appsmith in the internal network with proxy configured to use the <code>proxy</code> container. We don’t expose any ports for Appsmith, because we’ll be accessing it through the <code>rproxy</code> container.</li>
<li>We start the <code>proxy</code> container which will act as a proxy for all HTTP and HTTPS requests made by Appsmith. The proxy runs on port <code>8080</code>, but the web UI runs on port <code>8081</code>, which we expose to the host.</li>
<li>We start the <code>rproxy</code> container which will act as a reverse proxy for the host (i.e., us) to access Appsmith from the browser.</li>
<li>We start the <code>dns</code> container which will act as a DNS server for the internal network.</li>
<li>The Appsmith container uses two volumes: the <code>stacks</code> to hold all its data and the <code>resolv.conf</code> to add the <code>dns</code> container as another nameserver.</li>
<li>The <code>proxy</code> container has the <code>certs</code> volume, to store the CA certificate for <code>mitmproxy</code>.</li>
</ol>
<p>Now, there’s still a few missing pieces:</p>
<ol>
<li>We need the <code>mitmproxy</code>’s CA cert to be installed in the Appsmith container. This can be done, as <a href="https://docs.appsmith.com/getting-started/setup/instance-configuration/custom-domain/custom-ca-root-certificate#setup-custom-ca-root-folder" rel="noopener noreferrer" target="_blank">detailed in the documentation</a>, by copying the cert into <code>stacks/ca-certs</code> folder.</li>
<li>We need the <code>dns</code> container’s internal IP Address added to Appsmith container’s <code>resolv.conf</code> file.</li>
</ol>
<div class="hl"><pre class=content><code><span>docker-compose<span class="w"> </span>up<span class="w"> </span>-d<span class="w"> </span>dns
</span><span>mkdir<span class="w"> </span>-pv<span class="w"> </span>stacks/ca-certs
</span><span>cp<span class="w"> </span>-v<span class="w"> </span>certs/mitmproxy-ca.pem<span class="w"> </span>stacks/ca-certs/mitmproxy-ca.crt
</span><span>cat<span class="w"> </span><span class="s"><<EOF > resolv.conf</span>
</span><span><span class="s">nameserver 127.0.0.11</span>
</span><span><span class="s">options ndots:0</span>
</span><span><span class="s">nameserver $(</span>
</span><span><span class="s"> docker inspect "$(docker-compose ps -q dns)" -f $'{{range $k, $v := .NetworkSettings.Networks}}{{$k}}:{{$v.IPAddress}}\n{{end}}' \</span>
</span><span><span class="s"> | awk -F: '/_intnet:/ {print $2}'</span>
</span><span><span class="s">)</span>
</span><span><span class="s">EOF</span>
</span><span>docker-compose<span class="w"> </span>up<span class="w"> </span>-d
</span></code></pre></div>
<p>This will pick up the new CA cert, install it to the trust store, and also start using the new entry in <code>resolv.conf</code>.</p>
<p>With this setup, if the Appsmith container makes any outgoing HTTP requests with the configured proxy, it should work fine and should show up in <code>mitmproxy</code>’s web UI. But if tries to make a request without the proxy, it should fail. This will highlight any features and functionality that get affected due to this.</p>
<h2 id="further-explorations">Further Explorations<a class="headerlink" href="#further-explorations" title="Permanent link">¶</a></h2>
<ol>
<li>Configure static IP Addresses for the containers in the <code>docker-compose.yml</code>, especially the <code>dns</code> container. This should make it easier to configure the <code>resolv.conf</code> file.</li>
<li>Use NGINX <code>stream</code> reverse proxies to have the subject container connect to external databases.</li>
</ol>
<h2 id="conclusion">Conclusion<a class="headerlink" href="#conclusion" title="Permanent link">¶</a></h2>
<p>Since requests directly to the Internet fail, we can use this setup to test if our application doesn’t leak any requests when a proxy is configured. Ideally, when I configure a proxy to be used by an application, I don’t expect it to make <em>any</em> request without that proxy. This sounds like an obvious thing to expect, but the best of expectations fail when it comes to software. This is why we test. This guide should help us test proxy support for applications better.</p>Shell Script Best Practices2022-10-27T00:00:00+05:302022-10-27T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2022-10-27:/posts/shell-script-best-practices/<p>This article is about a few quick thumb rules I use when writing shell scripts that I’ve come to appreciate over the years. Very opinionated.</p>
<h2 id="things">Things<a class="headerlink" href="#things" title="Permanent link">¶</a></h2>
<ol>
<li>
<p>Use <code>bash</code>. Using <code>zsh</code> or <code>fish</code> or any other, will make it hard for others to understand / collaborate. Among all shells, <code>bash</code> strikes …</p></li></ol><p>This article is about a few quick thumb rules I use when writing shell scripts that I’ve come to appreciate over the years. Very opinionated.</p>
<h2 id="things">Things<a class="headerlink" href="#things" title="Permanent link">¶</a></h2>
<ol>
<li>
<p>Use <code>bash</code>. Using <code>zsh</code> or <code>fish</code> or any other, will make it hard for others to understand / collaborate. Among all shells, <code>bash</code> strikes a good balance between portability and DX.</p>
</li>
<li>
<p>Just make the first line be <code>#!/usr/bin/env bash</code>, even if you don’t give executable permission to the script file.</p>
</li>
<li>
<p>Use the <code>.sh</code> (or <code>.bash</code>) extension for your file. It may be fancy to not have an extension for your script, but unless your case explicitly depends on it, you’re probably just trying to do clever stuff. Clever stuff are hard to understand.</p>
</li>
<li>
<p>Use <code>set -o errexit</code> at the start of your script.</p>
<ul>
<li>So that when a command fails, <code>bash</code> exits instead of continuing with the rest of the script.</li>
</ul>
</li>
<li>
<p>Prefer to use <code>set -o nounset</code>. You <em>may</em> have a good excuse to not do this, but, my opinion, it’s best to always set it.</p>
<ul>
<li>This will make the script fail, when accessing an unset variable. Saves from horrible unintended consequences, with typos in variable names.</li>
<li>When you want to access a variable that may or may not have been set, use <code>"${VARNAME-}"</code> instead of <code>"$VARNAME"</code>, and you’re good.</li>
</ul>
</li>
<li>
<p>Use <code>set -o pipefail</code>. Again, you may have good reasons to not do this, but I’d recommend to always set it.</p>
<ul>
<li>This will ensure that a pipeline command is treated as failed, even if one command in the pipeline fails.</li>
</ul>
</li>
<li>
<p>Use <code>set -o xtrace</code>, with a check on <code>$TRACE</code> env variable.</p>
<ul>
<li>For copy-paste: <code>if [[ "${TRACE-0}" == "1" ]]; then set -o xtrace; fi</code>.</li>
<li>This helps in debugging your scripts, a lot. Like, really lot.</li>
<li>People can now <em>enable</em> debug mode, by running your script as <code>TRACE=1 ./script.sh</code> instead of <code>./script.sh</code>.</li>
</ul>
</li>
<li>
<p>Use <code>[[ ]]</code> for conditions in <code>if</code> / <code>while</code> statements, instead of <code>[ ]</code> or <code>test</code>.</p>
<ul>
<li><code>[[ ]]</code> is a bash <del>builtin</del> keyword, and is more powerful than <code>[ ]</code> or <code>test</code>.</li>
</ul>
</li>
<li>
<p>Always quote variable accesses with double-quotes.</p>
<ul>
<li>One place where it’s <em>okay</em> not to is on the <em>left-hand-side</em> of an <code>[[ ]]</code> condition. But even there I’d recommend quoting.</li>
<li>When you need the unquoted behaviour, using <code>bash</code> arrays will likely serve you much better.</li>
</ul>
</li>
<li>
<p>Use <code>local</code> variables in functions.</p>
</li>
<li>
<p>Accept multiple ways that users can ask for help and respond in kind.</p>
<ul>
<li>Check if the first arg is <code>-h</code> or <code>--help</code> or <code>help</code> or just <code>h</code> or even <code>-help</code>, and in all these cases, print help text and exit.</li>
<li>Please. For the sake of your future-self.</li>
</ul>
</li>
<li>
<p>When printing error messages, please redirect to stderr.</p>
<ul>
<li>Use <code>echo 'Something unexpected happened' >&2</code> for this.</li>
</ul>
</li>
<li>
<p>Use long options, where possible (like <code>--silent</code> instead of <code>-s</code>). These serve to document your commands explicitly.</p>
<ul>
<li>Note though, that commands shipped on some systems like macOS don’t always have long options.</li>
</ul>
</li>
<li>
<p>If appropriate, change to the script’s directory close to the start of the script.</p>
<ul>
<li>And it’s usually always appropriate.</li>
<li>Use <code>cd "$(dirname "$0")"</code>, which works in <em>most</em> cases.</li>
</ul>
</li>
<li>
<p>Use <code>shellcheck</code>. Heed its warnings.</p>
</li>
</ol>
<h2 id="template">Template<a class="headerlink" href="#template" title="Permanent link">¶</a></h2>
<div class="hl"><pre class=content><code><span><span class="ch">#!/usr/bin/env bash</span>
</span><span>
</span><span><span class="nb">set</span><span class="w"> </span>-o<span class="w"> </span>errexit
</span><span><span class="nb">set</span><span class="w"> </span>-o<span class="w"> </span>nounset
</span><span><span class="nb">set</span><span class="w"> </span>-o<span class="w"> </span>pipefail
</span><span><span class="k">if</span><span class="w"> </span><span class="o">[[</span><span class="w"> </span><span class="s2">"</span><span class="si">${</span><span class="nv">TRACE</span><span class="p">-0</span><span class="si">}</span><span class="s2">"</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">"1"</span><span class="w"> </span><span class="o">]]</span><span class="p">;</span><span class="w"> </span><span class="k">then</span>
</span><span><span class="w"> </span><span class="nb">set</span><span class="w"> </span>-o<span class="w"> </span>xtrace
</span><span><span class="k">fi</span>
</span><span>
</span><span><span class="k">if</span><span class="w"> </span><span class="o">[[</span><span class="w"> </span><span class="s2">"</span><span class="si">${</span><span class="nv">1</span><span class="p">-</span><span class="si">}</span><span class="s2">"</span><span class="w"> </span><span class="o">=</span>~<span class="w"> </span>^-*h<span class="o">(</span>elp<span class="o">)</span>?$<span class="w"> </span><span class="o">]]</span><span class="p">;</span><span class="w"> </span><span class="k">then</span>
</span><span><span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="s1">'Usage: ./script.sh arg-one arg-two</span>
</span><span>
</span><span><span class="s1">This is an awesome bash script to make your life better.</span>
</span><span>
</span><span><span class="s1">'</span>
</span><span><span class="w"> </span><span class="nb">exit</span>
</span><span><span class="k">fi</span>
</span><span>
</span><span><span class="nb">cd</span><span class="w"> </span><span class="s2">"</span><span class="k">$(</span>dirname<span class="w"> </span><span class="s2">"</span><span class="nv">$0</span><span class="s2">"</span><span class="k">)</span><span class="s2">"</span>
</span><span>
</span><span>main<span class="o">()</span><span class="w"> </span><span class="o">{</span>
</span><span><span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="k">do</span><span class="w"> </span>awesome<span class="w"> </span>stuff
</span><span><span class="o">}</span>
</span><span>
</span><span>main<span class="w"> </span><span class="s2">"</span><span class="nv">$@</span><span class="s2">"</span>
</span></code></pre></div>
<h2 id="conclusion">Conclusion<a class="headerlink" href="#conclusion" title="Permanent link">¶</a></h2>
<p>I try to follow these rules in my scripts, and they’re known to have made at least my own life better. I’m still not consistent though, unfortunately, in following my own rules. So perhaps writing them down this way will help me improve there as well.</p>
<p>Do you have anything you think I should add to this? Please share in the comments!</p>
<p>Edit 1: Included fixes from HN comments at <a href="https://news.ycombinator.com/item?id=33355407" rel="noopener noreferrer" target="_blank">https://news.ycombinator.com/item?id=33355407</a> and <a href="https://news.ycombinator.com/item?id=33355077" rel="noopener noreferrer" target="_blank">https://news.ycombinator.com/item?id=33355077</a>.</p>
<p>Edit 2: Fix from <a href="https://news.ycombinator.com/item?id=33354759" rel="noopener noreferrer" target="_blank">https://news.ycombinator.com/item?id=33354759</a>.</p>Quick insecure TOTP2022-09-10T00:00:00+05:302022-09-10T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2022-09-10:/posts/quick-insucure-totp/<p>This is about a Hammerspoon script I have that gives me a super-fast way to fill in TOTP fields in MFA logins.</p>
<p class="note"><strong>NOTE</strong>: This method of doing MFA is very likely, very unsafe. If you are any bit unsure about anything here, please stay away from this document.</p>
<h2 id="hammerspoon">Hammerspoon<a class="headerlink" href="#hammerspoon" title="Permanent link">¶</a></h2>
<p><a href="http://www.hammerspoon.org/" rel="noopener noreferrer" target="_blank">Hammerspoon …</a></p><p>This is about a Hammerspoon script I have that gives me a super-fast way to fill in TOTP fields in MFA logins.</p>
<p class="note"><strong>NOTE</strong>: This method of doing MFA is very likely, very unsafe. If you are any bit unsure about anything here, please stay away from this document.</p>
<h2 id="hammerspoon">Hammerspoon<a class="headerlink" href="#hammerspoon" title="Permanent link">¶</a></h2>
<p><a href="http://www.hammerspoon.org/" rel="noopener noreferrer" target="_blank">Hammerspoon</a> is a very convenient and powerful system automation system, that can be programmed in Lua, for macOS. It’s been my replacement for AutoHotkey after moving away from Windows.</p>
<p>Install with:</p>
<div class="hl"><pre class=content><code><span>brew<span class="w"> </span>install<span class="w"> </span>hammerspoon
</span></code></pre></div>
<h2 id="totp-script">TOTP Script<a class="headerlink" href="#totp-script" title="Permanent link">¶</a></h2>
<p>Four pieces to this.</p>
<p><strong>One</strong>, open <code>~/.hammerspoon/init.lua</code>, create if it doesn’t exist. Ensure you have the following line, perhaps among many others:</p>
<div class="hl"><pre class=content><code><span><span class="nb">require</span><span class="p">(</span><span class="s2">"totp-generator"</span><span class="p">).</span><span class="n">init</span><span class="p">()</span>
</span></code></pre></div>
<p><strong>Two</strong>, in <code>~/.hammerspoon/totp-generator.lua</code>, put the following content:</p>
<div class="hl"><input type=checkbox id=co-16><label for=co-16><span class='btn show-full-code-btn'>Show remaining 59 lines</span></label><pre class=content><code><span><span class="kd">local</span> <span class="n">os</span> <span class="o">=</span> <span class="nb">require</span><span class="p">(</span><span class="s2">"os"</span><span class="p">)</span>
</span><span><span class="kd">local</span> <span class="n">gauth</span> <span class="o">=</span> <span class="nb">require</span><span class="p">(</span><span class="s2">"gauth"</span><span class="p">)</span>
</span><span>
</span><span><span class="kd">local</span> <span class="n">mfa_note_path</span> <span class="o">=</span> <span class="nb">os.getenv</span><span class="p">(</span><span class="s2">"HOME"</span><span class="p">)</span> <span class="o">..</span> <span class="s2">"/.hammerspoon/otp-codes.csv"</span>
</span><span><span class="kd">local</span> <span class="n">keys</span> <span class="o">=</span> <span class="kc">nil</span>
</span><span>
</span><span><span class="kr">function</span> <span class="nf">init</span><span class="p">()</span>
</span><span> <span class="n">hs</span><span class="p">.</span><span class="n">hotkey</span><span class="p">.</span><span class="n">bind</span><span class="p">({</span><span class="s2">"alt"</span><span class="p">},</span> <span class="s2">"n"</span><span class="p">,</span> <span class="n">launch</span><span class="p">)</span>
</span><span> <span class="n">hs</span><span class="p">.</span><span class="n">pathwatcher</span><span class="p">.</span><span class="n">new</span><span class="p">(</span><span class="n">mfa_note_path</span><span class="p">,</span> <span class="kr">function</span><span class="p">()</span>
</span><span> <span class="n">keys</span> <span class="o">=</span> <span class="n">loadItems</span><span class="p">()</span>
</span><span> <span class="kr">end</span><span class="p">):</span><span class="n">start</span><span class="p">()</span>
</span><span> <span class="n">keys</span> <span class="o">=</span> <span class="n">loadItems</span><span class="p">()</span>
</span><span><span class="kr">end</span>
</span><span>
</span><span><span class="kd">local</span> <span class="n">chooser</span> <span class="o">=</span> <span class="n">hs</span><span class="p">.</span><span class="n">chooser</span><span class="p">.</span><span class="n">new</span><span class="p">(</span><span class="kr">function</span><span class="p">(</span><span class="n">item</span><span class="p">)</span>
</span><span> <span class="kr">if</span> <span class="n">item</span> <span class="o">==</span> <span class="kc">nil</span> <span class="kr">then</span>
</span><span> <span class="kr">return</span>
</span><span> <span class="kr">end</span>
</span><span>
</span><span> <span class="kd">local</span> <span class="n">hash</span> <span class="o">=</span> <span class="n">gauth</span><span class="p">.</span><span class="n">GenCode</span><span class="p">(</span><span class="n">item</span><span class="p">.</span><span class="n">_key</span><span class="p">,</span> <span class="nb">math.floor</span><span class="p">(</span><span class="nb">os.time</span><span class="p">()</span> <span class="o">/</span> <span class="mi">30</span><span class="p">))</span>
</span><span class=collapse> <span class="n">hs</span><span class="p">.</span><span class="n">eventtap</span><span class="p">.</span><span class="n">keyStrokes</span><span class="p">((</span><span class="s2">"%06d"</span><span class="p">):</span><span class="n">format</span><span class="p">(</span><span class="n">hash</span><span class="p">))</span>
</span><span class=collapse><span class="kr">end</span><span class="p">)</span>
</span><span class=collapse>
</span><span class=collapse><span class="n">chooser</span><span class="p">:</span><span class="n">queryChangedCallback</span><span class="p">(</span><span class="kr">function</span><span class="p">(</span><span class="n">query</span><span class="p">)</span>
</span><span class=collapse> <span class="kr">if</span> <span class="n">query</span> <span class="o">==</span> <span class="s2">""</span> <span class="kr">then</span>
</span><span class=collapse> <span class="n">chooser</span><span class="p">:</span><span class="n">choices</span><span class="p">(</span><span class="kc">nil</span><span class="p">)</span>
</span><span class=collapse> <span class="kr">end</span>
</span><span class=collapse>
</span><span class=collapse> <span class="kd">local</span> <span class="n">choices</span> <span class="o">=</span> <span class="p">{}</span>
</span><span class=collapse>
</span><span class=collapse> <span class="kr">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">item</span> <span class="kr">in</span> <span class="nb">pairs</span><span class="p">(</span><span class="n">filter</span><span class="p">(</span><span class="n">query</span><span class="p">,</span> <span class="n">keys</span><span class="p">)</span> <span class="ow">or</span> <span class="p">{})</span> <span class="kr">do</span>
</span><span class=collapse> <span class="nb">table.insert</span><span class="p">(</span><span class="n">choices</span><span class="p">,</span> <span class="n">item</span><span class="p">)</span>
</span><span class=collapse> <span class="kr">end</span>
</span><span class=collapse>
</span><span class=collapse> <span class="n">chooser</span><span class="p">:</span><span class="n">choices</span><span class="p">(</span><span class="n">choices</span><span class="p">)</span>
</span><span class=collapse><span class="kr">end</span><span class="p">)</span>
</span><span class=collapse>
</span><span class=collapse><span class="kr">function</span> <span class="nf">launch</span><span class="p">()</span>
</span><span class=collapse> <span class="n">chooser</span><span class="p">:</span><span class="n">choices</span><span class="p">(</span><span class="kc">nil</span><span class="p">)</span>
</span><span class=collapse> <span class="n">chooser</span><span class="p">:</span><span class="n">query</span><span class="p">(</span><span class="s2">""</span><span class="p">)</span>
</span><span class=collapse> <span class="n">chooser</span><span class="p">:</span><span class="n">show</span><span class="p">()</span>
</span><span class=collapse><span class="kr">end</span>
</span><span class=collapse>
</span><span class=collapse><span class="kr">function</span> <span class="nf">filter</span><span class="p">(</span><span class="n">query</span><span class="p">,</span> <span class="n">items</span><span class="p">)</span>
</span><span class=collapse> <span class="kr">if</span> <span class="n">query</span> <span class="o">==</span> <span class="s2">""</span> <span class="kr">then</span>
</span><span class=collapse> <span class="kr">return</span> <span class="kc">nil</span>
</span><span class=collapse> <span class="kr">end</span>
</span><span class=collapse> <span class="kd">local</span> <span class="n">lowerQuery</span> <span class="o">=</span> <span class="n">query</span><span class="p">:</span><span class="n">lower</span><span class="p">()</span>
</span><span class=collapse> <span class="kd">local</span> <span class="n">result</span> <span class="o">=</span> <span class="p">{}</span>
</span><span class=collapse> <span class="kr">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">item</span> <span class="kr">in</span> <span class="nb">pairs</span><span class="p">(</span><span class="n">items</span><span class="p">)</span> <span class="kr">do</span>
</span><span class=collapse> <span class="kr">if</span> <span class="n">item</span><span class="p">.</span><span class="n">text</span><span class="p">:</span><span class="n">lower</span><span class="p">():</span><span class="n">find</span><span class="p">(</span><span class="n">lowerQuery</span><span class="p">)</span> <span class="o">~=</span> <span class="kc">nil</span> <span class="kr">then</span>
</span><span class=collapse> <span class="nb">table.insert</span><span class="p">(</span><span class="n">result</span><span class="p">,</span> <span class="n">item</span><span class="p">)</span>
</span><span class=collapse> <span class="kr">end</span>
</span><span class=collapse> <span class="kr">end</span>
</span><span class=collapse> <span class="kr">return</span> <span class="n">result</span>
</span><span class=collapse><span class="kr">end</span>
</span><span class=collapse>
</span><span class=collapse><span class="kr">function</span> <span class="nf">loadItems</span><span class="p">()</span>
</span><span class=collapse> <span class="kd">local</span> <span class="n">f</span> <span class="o">=</span> <span class="nb">io.open</span><span class="p">(</span><span class="n">mfa_note_path</span><span class="p">,</span> <span class="s2">"r"</span><span class="p">)</span>
</span><span class=collapse> <span class="kd">local</span> <span class="n">content</span> <span class="o">=</span> <span class="n">f</span><span class="p">:</span><span class="n">read</span><span class="p">(</span><span class="s2">"*all"</span><span class="p">)</span>
</span><span class=collapse> <span class="n">f</span><span class="p">:</span><span class="n">close</span><span class="p">()</span>
</span><span class=collapse>
</span><span class=collapse> <span class="kd">local</span> <span class="n">entries</span> <span class="o">=</span> <span class="p">{}</span>
</span><span class=collapse> <span class="c1">-- Ref: https://www.lua.org/manual/5.3/manual.html#6.4.1</span>
</span><span class=collapse> <span class="kr">for</span> <span class="n">title</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">desc</span> <span class="kr">in</span> <span class="nb">string.gmatch</span><span class="p">(</span><span class="n">content</span><span class="p">,</span> <span class="s2">"%s*(.-)%s*,%s*(.-)%s*,%s*(.-)%s*</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span> <span class="kr">do</span>
</span><span class=collapse> <span class="nb">print</span><span class="p">(</span><span class="n">title</span><span class="p">,</span> <span class="n">desc</span><span class="p">)</span>
</span><span class=collapse> <span class="nb">table.insert</span><span class="p">(</span><span class="n">entries</span><span class="p">,</span> <span class="p">{</span>
</span><span class=collapse> <span class="n">text</span><span class="o">=</span><span class="n">title</span><span class="p">,</span>
</span><span class=collapse> <span class="n">subText</span><span class="o">=</span><span class="n">desc</span><span class="p">,</span>
</span><span class=collapse> <span class="n">_key</span><span class="o">=</span><span class="nb">string.lower</span><span class="p">(</span><span class="nb">string.gsub</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="s2">"%s+"</span><span class="p">,</span> <span class="s2">""</span><span class="p">)),</span>
</span><span class=collapse> <span class="p">})</span>
</span><span class=collapse> <span class="kr">end</span>
</span><span class=collapse>
</span><span class=collapse> <span class="kr">return</span> <span class="n">entries</span>
</span><span class=collapse><span class="kr">end</span>
</span><span class=collapse>
</span><span class=collapse><span class="kr">return</span> <span class="p">{</span>
</span><span class=collapse> <span class="n">init</span><span class="o">=</span><span class="n">init</span><span class="p">,</span>
</span><span class=collapse><span class="p">}</span>
</span></code></pre></div>
<p><strong>Three</strong>, download the <a href="https://raw.githubusercontent.com/teunvink/hammerspoon/master/gauth.lua" rel="noopener noreferrer" target="_blank">gauth.lua</a> file, and place it in <code>~/.hammerspoon</code> folder. This is what does the bulk of the work, so thanks to <a href="https://github.com/teunvink" rel="noopener noreferrer" target="_blank">teunvink</a> for this!</p>
<p><strong>Four</strong>, in the file <code>~/.hammerspoon/opt-codes.csv</code>, add your TOTP code data, one per line, like this:</p>
<div class="hl"><pre class=content><code><span>Mail,abcd efghi jklmn opqrst, Personal Mail Account
</span><span>Another,onemoretotpcodehere, Another nice account
</span></code></pre></div>
<p>Each line contains three entries, separated by commas. First is a title, short and easily identifiable, second is the TOTP Key, third is a description that you could include any longer explanation for yourself.</p>
<p>The TOTP Key in the second column is given by the MFA provider when configuring MFA. We are usually asked to scan the QR code on our phones when setting this up, but we can also get a TOTP Key, usually hidden behind a button that reads something like <code>Can't scan the code?</code>. Copy that key and put an entry here.</p>
<p>Now start/reload Hammerspoon.</p>
<p>Now, while your cursor is in a TOTP field, hit <kbd>Opt+n</kbd> and start searching for any entry from the CSV file, and hit Enter on the entry you want to be filled in.</p>
<h2 id="demo">Demo<a class="headerlink" href="#demo" title="Permanent link">¶</a></h2>
<video controls="" muted="" playsinline="" preload="" src="https://sharats.me/static/totp-generator-demo.mp4">Your browser does not support HTML5 video. Here’s <a href="https://sharats.me/static/totp-generator-demo.mp4">a link to the video</a>instead.</video>
<h2 id="conclusion">Conclusion<a class="headerlink" href="#conclusion" title="Permanent link">¶</a></h2>
<p>Again, this can be very convenient, but is not very secure. The way I use it on my system is quite a bit different from what I demonstrate here, but that’s only because I don’t want to show off the exact format I am using. So feel free to tweak the CSV format and use something else like JSON or some other encrypted source altogether, like the <code>pass</code> CLI, perhaps. But, I can’t speak for that.</p>
<p>Keep your keys safe. They are nothing less than passwords.</p>
<p>Thank you for reading.</p>Peeking into HTTPS Traffic with a Proxy2022-06-17T00:00:00+05:302022-06-17T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2022-06-17:/posts/peeking-into-https-traffic-with-a-proxy/<p>This article is about configuring a web application, Appsmith in this case, to run correctly behind a firewall that does SSL decryption, as a Docker container. Instead of a firewall, we’ll use a proxy, which, for the purpose of the problem statement, should be the same.</p>
<div class="toc"><span class="toctitle">Table of Contents …</span></div><p>This article is about configuring a web application, Appsmith in this case, to run correctly behind a firewall that does SSL decryption, as a Docker container. Instead of a firewall, we’ll use a proxy, which, for the purpose of the problem statement, should be the same.</p>
<div class="toc"><span class="toctitle">Table of Contents</span><ul>
<li><a href="#setting-up-mitmproxy">Setting up mitmproxy</a></li>
<li><a href="#setting-up">Setting up</a></li>
<li><a href="#setting-proxy-on-the-whole-container">Setting proxy on the whole container</a></li>
<li><a href="#conclusion">Conclusion</a></li>
<li><a href="#bonus-using-charles">Bonus: Using Charles</a></li>
</ul>
</div>
<p>Since the proxy needs to support HTTPS decryption, we’ll use <code>mitmproxy</code>, but Charles or any other proxy that supports this would also work just fine.</p>
<h2 id="setting-up-mitmproxy">Setting up <code>mitmproxy</code><a class="headerlink" href="#setting-up-mitmproxy" title="Permanent link">¶</a></h2>
<p>Install with:</p>
<div class="hl"><pre class=content><code><span>brew<span class="w"> </span>install<span class="w"> </span>mitmproxy
</span></code></pre></div>
<p>Now launch it using:</p>
<div class="hl"><pre class=content><code><span>mitmweb<span class="w"> </span>--listen<span class="w"> </span><span class="m">9020</span><span class="w"> </span>--web<span class="w"> </span><span class="m">9021</span>
</span></code></pre></div>
<p>Let it run in a separate Terminal window in the background. This will also open the proxy’s web UI at <a href="http://localhost:9021" rel="noopener noreferrer" target="_blank">http://localhost:9021</a>. To get a console UI instead, use <code>mitmproxy</code> instead of <code>mitmweb</code> in the above command.</p>
<p>Let’s try running some requests through this proxy to see it’s working well. Start with:</p>
<div class="hl"><pre class=content><code><span>curl<span class="w"> </span>http://httpbun.com/get
</span></code></pre></div>
<p>This should print a valid JSON as the response, with some details about the request itself. Let’s repeat this with the proxy.</p>
<div class="hl"><pre class=content><code><span>curl<span class="w"> </span>--proxy<span class="w"> </span>localhost:9020<span class="w"> </span>http://httpbun.com/get
</span></code></pre></div>
<p>You should again see the same response here, but this time, a new entry should appear in the <code>mitmweb</code> UI. Here, you can inspect the request and be able to see the path, headers and response of the request.</p>
<p>So we’ve confirmed that our proxy works. Let’s add HTTPS to the mix.</p>
<div class="hl"><pre class=content><code><span>curl<span class="w"> </span>https://httpbun.com/get
</span></code></pre></div>
<p>Again, same thing, but with HTTPS, without a proxy. You should see the same response as before, but without an entry in the proxy. That’s to be expected since we didn’t put a <code>--proxy</code> here. Let’s try that now.</p>
<div class="hl"><pre class=content><code><span>curl<span class="w"> </span>--proxy<span class="w"> </span>localhost:9020<span class="w"> </span>https://httpbun.com/get
</span></code></pre></div>
<p>This will fail with a verification error, that the SSL certificate couldn’t be verified. Let’s see why.</p>
<p>The way an SSL proxy works is by establishing two SSL connections, one with the client (a browser, or <code>curl</code>), initiated by the client, and another with the server (the <code>httpbun.com</code> server in this case). Everything sent by the client is encrypted using the certificate of <code>mitmproxy</code>, and everything by and to the server is encrypted with the server’s certificate.</p>
<p>The first time <code>mitmproxy</code> is started, it creates a new root certificate, in the <code>~/.mitmproxy</code> folder. We can install this root certificate on our system, and then <code>curl</code>, or any other client, will trust it. The <code>mitmproxy</code> docs talk about <a href="https://docs.mitmproxy.org/stable/concepts-certificates/#installing-the-mitmproxy-ca-certificate-manually" rel="noopener noreferrer" target="_blank">how to install this cert</a>. Optionally, for <code>curl</code>, instead of installing the cert, we can use the <code>--cacert</code> flag to point to the root certificate.</p>
<p>Another point to note here, is that installing this root certificate on your system, doesn’t mean it’ll be trusted in any Docker containers run on your system. Docker containers are isolated systems in this context, and maintain their own list of trusted root certificates.</p>
<p>To illustrate this, first, let’s run the same request from inside a container, and we should see the error right away:</p>
<div class="hl"><pre class=content><code><span>docker<span class="w"> </span>run<span class="w"> </span>--rm<span class="w"> </span>alpine/curl<span class="w"> </span>--proxy<span class="w"> </span>host.docker.internal:9020<span class="w"> </span>https://httpbun.com/get
</span></code></pre></div>
<p>At this, you should see a certificate validation error. This is because the root certificate of <code>mitmproxy</code> isn’t installed inside the container’s environment, and so the <code>curl</code> invocation inside, won’t be able to verify <code>mitmproxy</code>’s certificate.</p>
<p>To confirm that this is indeed because of <code>mitmproxy</code>, run the same <code>docker run</code> command without the <code>--proxy host.docker.internal</code> and you won’t see this error, despite running with <code>https</code>.</p>
<p>Now we’ve reproduced the situation where a process (a web server in our case), inside a Docker container, is trying to run behind an SSL-decrypting firewall (or, an SSL-decrypting proxy in our case here). Let’s see what we can do to get this to work.</p>
<h2 id="setting-up">Setting up<a class="headerlink" href="#setting-up" title="Permanent link">¶</a></h2>
<p>For our adventure here, we’ll use the Docker image of Appsmith, located at <a href="https://hub.docker.com/repository/docker/appsmith/appsmith-ce" rel="noopener noreferrer" target="_blank">https://hub.docker.com/repository/docker/appsmith/appsmith-ce</a>.</p>
<p>Let’s start a <em>temporary</em> Appsmith container with:</p>
<div class="hl"><pre class=content><code><span>docker<span class="w"> </span>run<span class="w"> </span>--rm<span class="w"> </span>-d<span class="w"> </span>--name<span class="w"> </span>ace<span class="w"> </span>-p<span class="w"> </span><span class="m">80</span>:9022<span class="w"> </span>appsmith/appsmith-ce
</span></code></pre></div>
<p>Once this is ready, you should be able to access your Appsmith instance at <a href="http://localhost:9022" rel="noopener noreferrer" target="_blank">http://localhost:9022</a>.</p>
<p>Let’s try to run some <code>curl</code> requests inside this container, and get them to go through our <code>mitmweb</code> proxy.</p>
<div class="hl"><pre class=content><code><span>docker<span class="w"> </span><span class="nb">exec</span><span class="w"> </span>ace<span class="w"> </span>curl<span class="w"> </span>--proxy<span class="w"> </span>host.docker.internal:9020<span class="w"> </span>http://httpbun.com/get
</span></code></pre></div>
<p>This should work fine, and the request should show up in the proxy UI with full details as well. Now let’s do the same thing with <code>https</code>.</p>
<div class="hl"><pre class=content><code><span>docker<span class="w"> </span><span class="nb">exec</span><span class="w"> </span>ace<span class="w"> </span>curl<span class="w"> </span>--proxy<span class="w"> </span>host.docker.internal:9020<span class="w"> </span>https://httpbun.com/get
</span></code></pre></div>
<p>Let’s copy the root certificate into the container. For <code>mitmproxy</code>, the root cert is generated at first start, and is located at <code>~/.mitmproxy/mitmproxy-ca-cert.pem</code>, going by the docs at <a href="https://docs.mitmproxy.org/stable/concepts-certificates/#the-mitmproxy-certificate-authority" rel="noopener noreferrer" target="_blank">https://docs.mitmproxy.org/stable/concepts-certificates/#the-mitmproxy-certificate-authority</a>.</p>
<div class="hl"><pre class=content><code><span>docker<span class="w"> </span>cp<span class="w"> </span>~/.mitmproxy/mitmproxy-ca-cert.pem<span class="w"> </span>ace:/
</span></code></pre></div>
<p>With this command, we are copying the root certificate of <code>mitmproxy</code> into the container, into the root folder. Let’s run the same <code>curl</code> command now, providing it this root cert:</p>
<div class="hl"><pre class=content><code><span>docker<span class="w"> </span><span class="nb">exec</span><span class="w"> </span>ace<span class="w"> </span>curl<span class="w"> </span>--proxy<span class="w"> </span>host.docker.internal:9020<span class="w"> </span>--cacert<span class="w"> </span>/mitmproxy-ca-cert.pem<span class="w"> </span>https://httpbun.com/get
</span></code></pre></div>
<p>Now we’ll see the correct response, as well as full details of this request in the proxy UI.</p>
<h2 id="setting-proxy-on-the-whole-container">Setting proxy on the whole container<a class="headerlink" href="#setting-proxy-on-the-whole-container" title="Permanent link">¶</a></h2>
<p>We’re now at the point where it’s possible for requests inside the container to be run via the proxy, without any cert validation errors.</p>
<p>But this currently needs to be deliberate. Like in the example above, the <code>curl</code> command needs the cert to be specified explicitly. Instead, we’d like even ordinary <code>curl</code> commands to always go through the proxy, since, that’s how a firewall would work, and ultimately, that’s what we are trying to reproduce here.</p>
<p>Let’s stop the <code>ace</code> container and start it again with proxy configuration set.</p>
<div class="hl"><pre class=content><code><span>docker<span class="w"> </span>stop<span class="w"> </span>ace
</span><span>docker<span class="w"> </span>run<span class="w"> </span>--rm<span class="w"> </span>-d<span class="w"> </span>--name<span class="w"> </span>ace<span class="w"> </span>-p<span class="w"> </span><span class="m">80</span>:9022<span class="w"> </span><span class="se">\</span>
</span><span><span class="w"> </span>-e<span class="w"> </span><span class="nv">HTTP_PROXY</span><span class="o">=</span>http://host.docker.internal:9020<span class="w"> </span><span class="se">\</span>
</span><span><span class="w"> </span>-e<span class="w"> </span><span class="nv">HTTPS_PROXY</span><span class="o">=</span>http://host.docker.internal:9020<span class="w"> </span><span class="se">\</span>
</span><span><span class="w"> </span>-e<span class="w"> </span><span class="nv">http_proxy</span><span class="o">=</span>http://host.docker.internal:9020<span class="w"> </span><span class="se">\</span>
</span><span><span class="w"> </span>-e<span class="w"> </span><span class="nv">https_proxy</span><span class="o">=</span>http://host.docker.internal:9020<span class="w"> </span><span class="se">\</span>
</span><span><span class="w"> </span>appsmith/appsmith-ce
</span></code></pre></div>
<p>Yep, that’s right. We need to set both <code>http_proxy</code> <em>and</em> <code>HTTP_PROXY</code> for all applications inside the container to take it seriously. 🤦</p>
<p>Let’s run a normal <code>curl</code> request on this container to see if the proxy settings are applied:</p>
<div class="hl"><pre class=content><code><span>docker<span class="w"> </span>run<span class="w"> </span>ace<span class="w"> </span>curl<span class="w"> </span>http://httpbun.com/get
</span></code></pre></div>
<p>If the proxy configuration is working, then you should see this request appear in the proxy UI. Also, for <code>https</code> URLs:</p>
<div class="hl"><pre class=content><code><span>docker<span class="w"> </span>run<span class="w"> </span>ace<span class="w"> </span>curl<span class="w"> </span>https://httpbun.com/get
</span></code></pre></div>
<p>This, as we can expect, fails due to a cert validation error, since it’s using the proxy, but the proxy’s certificate can’t be verified. We can provide the root cert of <code>mitmproxy</code> using the <code>--cacert</code> argument, but we want it to apply to all requests in the container, without such explicit configuration, so we won’t do that.</p>
<p>Instead, we want to <em>install</em> the root certificate of <code>mitmproxy</code> to the <em>truststore</em>, so that it’s available to <em>all</em> processes in the container for validating SSL certificates.</p>
<p>How this is done, depends on the operating system, but in our case, since the container is Ubuntu, all we need to do is:</p>
<ul>
<li>Copy the certificate file to <code>/usr/local/share/ca-certificates</code>.</li>
<li>If the cert has the <code>.pem</code> extension, rename it to use the <code>.crt</code> extension. This is because Ubuntu’s <code>update-ca-certificates</code> command only picks files with a <code>.crt</code> extension.</li>
<li>Run <code>update-ca-certificates</code>.</li>
</ul>
<p>Let’s copy the root cert into the container, and install it by running the above commands inside the container:</p>
<div class="hl"><pre class=content><code><span>docker<span class="w"> </span>cp<span class="w"> </span>~/.mitmproxy/mitmproxy-ca-cert.pem<span class="w"> </span>ace:/usr/local/share/ca-certificates/mitmproxy-ca-cert.crt
</span><span>docker<span class="w"> </span><span class="nb">exec</span><span class="w"> </span>ace<span class="w"> </span>update-ca-certificates
</span></code></pre></div>
<p>The output should say that one certificate has been added to the truststore.</p>
<p>Let’s run the same <code>https</code> request again:</p>
<div class="hl"><pre class=content><code><span>docker<span class="w"> </span><span class="nb">exec</span><span class="w"> </span>ace<span class="w"> </span>curl<span class="w"> </span>https://httpbun.com/get
</span></code></pre></div>
<p>This should now print the correct response, as well as show up on the proxy UI with full details for inspection. 🎉</p>
<h2 id="conclusion">Conclusion<a class="headerlink" href="#conclusion" title="Permanent link">¶</a></h2>
<p>This has culminated in creating the PR <a href="https://github.com/appsmithorg/appsmith/pull/14207/files" rel="noopener noreferrer" target="_blank">#14207</a>. This PR contains a fer QoL improvements over the solution above.</p>
<ol>
<li>
<p>We install <code>ca-certificates-java</code>, so that when we run <code>update-ca-certificates</code>, they are also installed into the JVM truststore. This is important since, one, Java maintains its own truststore (like Firefox), and two, Appsmith’s server runs on the JVM, so we need this there as well.</p>
</li>
<li>
<p>We provide support for a <code>ca-certs</code> folder in the volume, where users can drop any root cert files which will be auto-added on container startup.</p>
</li>
<li>
<p>We run <code>update-ca-certificates --fresh</code> instead of just <code>update-ca-certificates</code>, so that any cert file <em>removed</em> from the <code>ca-certs</code> folder, also gets removed from the truststores.</p>
</li>
<li>
<p>We mix up values of the proxy env variables, so that setting just one of <code>http_proxy</code> and <code>HTTP_PROXY</code> would be enough. This is also done for <code>https_proxy</code> and <code>HTTPS_PROXY</code>.</p>
</li>
<li>
<p>We provide a friendly warning when there’s <code>.pem</code> files in the <code>ca-certs</code> folder, since, most likely, they are there because the user forgot to rename them to use the <code>.crt</code> extension.</p>
</li>
<li>
<p>The JVM needs the <code>-Djava.net.useSystemProxies=true</code> to use the system configured proxy. Additionally, we also set the individual proxy configuration as additional system properties, so we can apply them when executing requests via Apaches’ web client libraries. Since, well, that library doesn’t respect system proxy configuration, although the rest of JVM does. Go figure.</p>
</li>
<li>
<p>We set a <code>NO_PROXY</code> env variable to hosts that should <em>not</em> go through the proxy, like <code>localhost</code> and <code>127.0.0.1</code>.</p>
</li>
</ol>
<p>Of course, considering our premise, which is to be able to use Appsmith behind an SSL decrypting proxy, all a user needs to do, is to place the firewall’s root certificate in the <code>ca-certs</code> folder, and restart the Appsmith container.</p>
<h2 id="bonus-using-charles">Bonus: Using Charles<a class="headerlink" href="#bonus-using-charles" title="Permanent link">¶</a></h2>
<p>Notes on using Charles instead of <code>mitmproxy</code>.</p>
<p>Install with:</p>
<div class="hl"><pre class=content><code><span>brew<span class="w"> </span>install<span class="w"> </span>charles
</span></code></pre></div>
<p>Open Charles</p>
<p>Go to <code>Proxy -> SSL Proxying Settings</code>, under “SSL Proxying”, add a few domains you want SSL decryption to be done. Let’s add an entry under “Include”, with host set to <code>httpbun.com</code> and port set to <code>443</code> (which is the default port of HTTPS).</p>
<p>Check with http curl, response should show up correctly, and the request should show up in Charles with full information.</p>
<p>Check with https curl, get an error response back, and the request should show up in Charles with incomplete information, and a red error icon.</p>
<p>To get the Charles’ root certificate, go to <code>Help -> SSL Proxying -> Save Charles Root Certificate...</code>. Provide a location to save this cert, like your home folder.</p>
<p>The other steps should be the same as explained above for <code>mitmproxy</code>.</p>Time is different every time2021-12-24T00:00:00+05:302021-12-24T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2021-12-24:/posts/time-is-different-every-time/<p>I love automating things, with shell aliases, global hotkeys, IDE snippets etc.</p>
<p>I see this question of have you spent more time automating something, than the time it’s saved you?</p>
<p>I’ve seen this question a lot of times over the years, whenever someone sees me using such a …</p><p>I love automating things, with shell aliases, global hotkeys, IDE snippets etc.</p>
<p>I see this question of have you spent more time automating something, than the time it’s saved you?</p>
<p>I’ve seen this question a lot of times over the years, whenever someone sees me using such a shortcut</p>
<blockquote>
<p>How long did it take for you to build and learn that automation? Was the time you saved from it worth it?</p>
</blockquote>
<p>My answer to that is, of course, yes. But the question is a little more nuanced.</p>
<p><em>Was the time saved <strong>worth</strong> it?</em> Yes.</p>
<p><em>Was the time saved <strong>more</strong> than the time you spent in building and learning?</em> No.</p>
<p>So, I spent <em>more</em> time, in building and learning the shortcut, than I saved because of the shortcut. This was illustrated well in <a href="https://xkcd.com/1205/" rel="noopener noreferrer" target="_blank">this XKCD comic</a>:</p>
<p class="img"><a href="https://imgs.xkcd.com/comics/is_it_worth_the_time.png" rel="noopener noreferrer" target="_blank"><img alt="Is it worth the time" src="https://imgs.xkcd.com/comics/is_it_worth_the_time.png"></a></p>
<p>This, for most people, makes learning such shortcuts a waste of time. Because, of course, the net time difference is negative. Therein lies the folly.</p>
<blockquote>
<p>Not all five minutes hold the same value.</p>
</blockquote>
<p>There are times when I’m working on a critical fix that needs to go out in negative time. I hope to not end up in such situations, but we do. In such situations, saving a few precious seconds can mean a lot.</p>
<p>Consider a hypothetical example, an internal application server is down for whatever reason. I need to SSH into the server to see what’s up. Sure, I could go into my notes, search for the long SSH command for this server, SSH into it, then run commands to check logs, and then to restart if needed etc.</p>
<p>But, what if this was a single shell script. Just SSH into that server, print me the logs, and ask me if I want it restarted or not. Just a Y/N answer. I’m quite sure developing such a script would take more time than I’d be saving. However, I’d be spending that time developing this script, when I’m not in a hurry.</p>
<p>I can afford to spend those <em>ten minutes</em> in such a situation, to save <em>ten seconds</em> in a more critical situation. This is what makes it worth it.</p>
<p>But there’s an ugly face to this. We should know when some shortcut is <em>enough</em>. It’s easy to get into the trap of trying to optimize it and make it better and better. This is well represented in <a href="https://xkcd.com/1319/" rel="noopener noreferrer" target="_blank">this comic by XKCD</a>:</p>
<p class="img"><a href="https://imgs.xkcd.com/comics/automation.png" rel="noopener noreferrer" target="_blank"><img alt="The trap of automation, by XKCD" src="https://imgs.xkcd.com/comics/automation.png"></a></p>
<p>Part of the problem is, developers, just like artists, often are never done. There’s always a small finishing touch that can be done.</p>
<p>The trick is to recognize, and even assume, that you’ll be the only user ever of this shortcut. If it works for you, without too many brain cycles, in a critical situation, you’re done. Move on.</p>
<p>So, what do I automate? I’ve written about my automations and workflows quite a bit in the past:</p>
<ul>
<li><a href="../automating-the-vim-workplace/">Automating with Vim workplace</a>, <a href="../automating-the-vim-workplace-2/">part 2</a>, and <a href="../automating-the-vim-workplace-3/">part 3</a>.</li>
<li><a href="../the-magic-of-autohotkey/">The Magic of AutoHotkey</a>, and <a href="../the-magic-of-autohotkey-2/">part 2</a>.</li>
</ul>
<p>Today, I primarily work with macOS, and have come to love Hammerspoon, as an alternative to AutoHotkey on Windows. I intend to write about my Hammerspoon automations as well, soon.</p>
<p>As I always say, <em>identify, automate, repeat</em>.</p>The Python `print` function2020-04-05T00:00:00+05:302020-04-05T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2020-04-05:/posts/python-print-function/<p>The <a href="https://docs.python.org/3/library/functions.html#print" rel="noopener noreferrer" target="_blank"><code>print</code></a> function is most likely the first function we encounter when learning Python.
That encounter usually looks like <code>print("Hello world!")</code>. After that, we go on to learning more
stuff about it like being able to pass any number of arguments or of any type etc. I’m writing …</p><p>The <a href="https://docs.python.org/3/library/functions.html#print" rel="noopener noreferrer" target="_blank"><code>print</code></a> function is most likely the first function we encounter when learning Python.
That encounter usually looks like <code>print("Hello world!")</code>. After that, we go on to learning more
stuff about it like being able to pass any number of arguments or of any type etc. I’m writing this
article to give an idea how deep this rabbit hole goes. Turns out, the <code>print</code> function is <em>very</em>
powerful. So let’s get a coffee, put on a dusty pair of sunglasses and bask in its power!</p>
<div class="toc"><span class="toctitle">Table of Contents</span><ul>
<li><a href="#the-basics">The Basics</a></li>
<li><a href="#handling-of-multiple-arguments">Handling of Multiple Arguments</a></li>
<li><a href="#handling-of-non-string-types">Handling of non-string types</a></li>
<li><a href="#write-to-files">Write to files</a><ul>
<li><a href="#using-sysstderr">Using sys.stderr</a></li>
<li><a href="#modifying-sysstdout">Modifying sys.stdout</a></li>
<li><a href="#collecting-with-iostringio">Collecting with io.StringIO</a></li>
</ul>
</li>
<li><a href="#the-end-keyword-argument">The end= keyword argument</a></li>
<li><a href="#a-note-about-python-2">A Note about Python 2</a></li>
<li><a href="#a-sad-imitation">A Sad Imitation</a></li>
<li><a href="#the-pprint-function">The pprint Function</a></li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
</div>
<h2 id="the-basics">The Basics<a class="headerlink" href="#the-basics" title="Permanent link">¶</a></h2>
<p>The basic premise of the <code>print</code> function is quite, well, basic. It prints out the given arguments
to the standard output.</p>
<div class="hl"><pre class=content><code><span><span class="nb">print</span><span class="p">(</span><span class="s2">"Hello world!"</span><span class="p">)</span>
</span></code></pre></div>
<p>This prints:</p>
<div class="hl"><pre class=content><code><span>Hello world!
</span></code></pre></div>
<p>Calling it with multiple arguments:</p>
<div class="hl"><pre class=content><code><span><span class="nb">print</span><span class="p">(</span><span class="s2">"hello"</span><span class="p">,</span> <span class="s2">"world"</span><span class="p">)</span>
</span></code></pre></div>
<p>This prints:</p>
<div class="hl"><pre class=content><code><span>hello world
</span></code></pre></div>
<p>Notice that the two strings, <code>"hello"</code> and <code>"world"</code> have a space character printed between them.
The <code>print</code> function is helpful like that. By default, it places a space between every pair of
consecutive arguments to be printed.</p>
<p>It doesn’t have to be strings either:</p>
<div class="hl"><pre class=content><code><span><span class="nb">print</span><span class="p">(</span><span class="mi">42</span><span class="p">,</span> <span class="s2">"is the answer"</span><span class="p">)</span>
</span></code></pre></div>
<p>This prints:</p>
<div class="hl"><pre class=content><code><span>42 is the answer
</span></code></pre></div>
<p>Let’s look at each of these features in detail and see how they work.</p>
<h2 id="handling-of-multiple-arguments">Handling of Multiple Arguments<a class="headerlink" href="#handling-of-multiple-arguments" title="Permanent link">¶</a></h2>
<p>The <code>print</code> function accepts arbitrary number of arguments to be printed. These arguments can’t be
keyword-arguments, because that doesn’t make much sense. That’s not to say the <code>print</code> function
doesn’t accept any keyword arguments, it does. In fact, the space character that shows up between
the arguments to be printed, can be changed by providing the <code>sep=</code> keyword argument.</p>
<p>Let’s look at the following examples:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="s2">"the"</span><span class="p">,</span> <span class="s2">"world"</span><span class="p">,</span> <span class="s2">"is"</span><span class="p">,</span> <span class="s2">"a"</span><span class="p">,</span> <span class="s2">"cruel"</span><span class="p">,</span> <span class="s2">"place"</span><span class="p">)</span>
</span><span><span class="go">the world is a cruel place</span>
</span><span>
</span><span><span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="s2">"the"</span><span class="p">,</span> <span class="s2">"world"</span><span class="p">,</span> <span class="s2">"is"</span><span class="p">,</span> <span class="s2">"a"</span><span class="p">,</span> <span class="s2">"cruel"</span><span class="p">,</span> <span class="s2">"place"</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s2">"-"</span><span class="p">)</span>
</span><span><span class="go">the-world-is-a-cruel-place</span>
</span><span>
</span><span><span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="s2">"the"</span><span class="p">,</span> <span class="s2">"world"</span><span class="p">,</span> <span class="s2">"is"</span><span class="p">,</span> <span class="s2">"a"</span><span class="p">,</span> <span class="s2">"cruel"</span><span class="p">,</span> <span class="s2">"place"</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s2">""</span><span class="p">)</span>
</span><span><span class="go">theworldisacruelplace</span>
</span></code></pre></div>
<p>In the first example, we don’t explicitly give any value to the <code>sep=</code> keyword argument. So it takes
it’s default value of the space character <code>" "</code>. In the second example, we set it to the dash
character <code>"-"</code> and we can see in the output that the strings are printed joined by dashes.</p>
<p>In the third example, we set the <code>sep=</code> to an empty string so the output is all the words printed
consecutively making it a cruel experience to read the text.</p>
<p>The <code>sep=</code> argument can be any string, it doesn’t have to be a single character and it can contain
newlines and any other shenanigans too.</p>
<div class="hl"><pre class=content><code><span><span class="nb">print</span><span class="p">(</span><span class="s2">"the"</span><span class="p">,</span> <span class="s2">"birds"</span><span class="p">,</span> <span class="s2">"in"</span><span class="p">,</span> <span class="s2">"the"</span><span class="p">,</span> <span class="s2">"sky"</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s2">"</span><span class="se">\n</span><span class="s2"> hammertime</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
</span></code></pre></div>
<p>This prints the following mind bogglingly useful output:</p>
<div class="hl"><pre class=content><code><span>the
</span><span> hammertime
</span><span>birds
</span><span> hammertime
</span><span>in
</span><span> hammertime
</span><span>the
</span><span> hammertime
</span><span>sky
</span></code></pre></div>
<p>Yeah, that’s a useful trick, but please, consider people’s sanity when you do such !@#$.</p>
<h2 id="handling-of-non-string-types">Handling of non-string types<a class="headerlink" href="#handling-of-non-string-types" title="Permanent link">¶</a></h2>
<p>We know that the <code>print</code> function can handle printing objects of any type, not just strings. But how
does that work? The simple answer to this is that <code>print</code> will call <code>str</code> on non-string objects, and
print the result of that call.</p>
<p>Let’s experiment with this. Consider the following class definition, which has just one method, the
<code>__str__</code>. If you are unaware of this method, this is what’s called when <code>str</code> is applied on an
instance of this class. I won’t go into details of that as that’s not the topic of this article.</p>
<div class="hl"><pre class=content><code><span><span class="k">class</span> <span class="nc">Tantrum</span><span class="p">:</span>
</span><span> <span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span><span> <span class="k">return</span> <span class="s2">"awesome __str__ of object </span><span class="si">%r</span><span class="s2">"</span> <span class="o">%</span> <span class="nb">id</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span>
</span><span>
</span><span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="n">Tantrum</span><span class="p">())</span>
</span></code></pre></div>
<p>The output of running this would be something like (the number in the end would obviously be
different if you run this script):</p>
<div class="hl"><pre class=content><code><span>awesome __str__ of object 4508612624
</span></code></pre></div>
<p>So, what happens if our class doesn’t define an implementation for the <code>__str__</code> method? Let’s try
that out:</p>
<div class="hl"><pre class=content><code><span><span class="k">class</span> <span class="nc">LazySloth</span><span class="p">:</span>
</span><span> <span class="k">pass</span>
</span><span>
</span><span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="n">LazySloth</span><span class="p">())</span>
</span></code></pre></div>
<p>This prints the following output (again, the number in the end would obviously be different for
you):</p>
<div class="hl"><pre class=content><code><span><__main__.LazySloth object at 0x105f327d0>
</span></code></pre></div>
<p>Turns out that when there’s no implementation for <code>__str__</code>, calling <code>str</code> on the instance will
still produce some information regarding the instance, which is what we got above.</p>
<p>A neat thing here is that this output is actually what calling <code>repr</code> on the instance would produce.
So, it looks like <code>str</code> is falling back to returning the output of <code>repr</code>, when there’s no
implementation for <code>__str__</code> provided. Let’s confirm this by defining a <code>__repr__</code> method:</p>
<div class="hl"><pre class=content><code><span><span class="k">class</span> <span class="nc">RatInFormals</span><span class="p">:</span>
</span><span> <span class="k">def</span> <span class="fm">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span><span> <span class="k">return</span> <span class="s2">"a sad overridden __repr__ for instance </span><span class="si">%r</span><span class="s2">"</span> <span class="o">%</span> <span class="nb">id</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span>
</span><span>
</span><span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="n">RatInFormals</span><span class="p">())</span>
</span></code></pre></div>
<p>This prints the following output (again, the number will be different for you):</p>
<div class="hl"><pre class=content><code><span>a sad overridden __repr__ for instance 4313389264
</span></code></pre></div>
<p>Now we get the output of the overridden <code>__repr__</code>.</p>
<p>So here’s how it works. The <code>print</code> function calls <code>str</code> on any non-string objects, which returns
the result of the <code>__str__</code> method, if available, or the result of calling <code>repr</code> on the instance,
which in turn returns the result of the <code>__repr__</code> method, which results in a generic output unless
overridden (like in the last example above).</p>
<p>This should be case in favor towards spending a few seconds thinking about and writing useful
<code>__str__</code> methods for your custom types. Someone walking along working with your code later on,
might just print an instance of your class to see what’s in it, and the generic output with the
instance’s <code>id</code> is unlikely to be very helpful.</p>
<h2 id="write-to-files">Write to files<a class="headerlink" href="#write-to-files" title="Permanent link">¶</a></h2>
<p>Another keyword argument accepted by <code>print</code> is <code>file=</code>. This can be set to a <code>file</code> object, in
which case the <em>printing</em> will be done to that file object instead of standard output.</p>
<p>Let’s try writing text to a file using the <code>print</code> function like this:</p>
<div class="hl"><pre class=content><code><span><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">"outputs.txt"</span><span class="p">,</span> <span class="s2">"w"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
</span><span> <span class="nb">print</span><span class="p">(</span><span class="s2">"Stuff that doesn't show up in standard output"</span><span class="p">,</span> <span class="n">file</span><span class="o">=</span><span class="n">f</span><span class="p">)</span>
</span></code></pre></div>
<p>Running this script obviously doesn’t print anything to the standard output. Instead, a file
“outputs.txt” is created which contains the following text:</p>
<div class="hl"><pre class=content><code><span>Stuff that doesn't show up in standard output
</span></code></pre></div>
<p>Note that since we are opening the file with mode as <code>"w"</code>, so if a file named “outputs.txt” already
exists in the current folder, it <strong>will be overwritten</strong>.</p>
<h3 id="using-sysstderr">Using <code>sys.stderr</code><a class="headerlink" href="#using-sysstderr" title="Permanent link">¶</a></h3>
<p>The <a href="https://docs.python.org/3/library/sys.html#sys.stderr" rel="noopener noreferrer" target="_blank"><code>sys.stderr</code></a> object in the <code>sys</code> module is a file-like object that represents the
standard error. Writing to this file-like object directs it to the standard error stream. This is
similar to the <a href="https://docs.python.org/3/library/sys.html#sys.stdout" rel="noopener noreferrer" target="_blank"><code>sys.stdout</code></a> object which represents the standard output stream, in a
similar fashion.</p>
<p>The <code>file=</code> keyword argument can be set to <code>sys.stderr</code> which will print to the standard error
stream.</p>
<div class="hl"><pre class=content><code><span><span class="kn">import</span> <span class="nn">sys</span>
</span><span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="s2">"stuff going to standard error"</span><span class="p">,</span> <span class="n">file</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="p">)</span>
</span></code></pre></div>
<p>You might not notice any difference from setting the <code>file=</code> argument in the above script, but if
you are running a terminal emulator / shell that shows standard error in red color, then you’ll be
able to see a difference.</p>
<h3 id="modifying-sysstdout">Modifying <code>sys.stdout</code><a class="headerlink" href="#modifying-sysstdout" title="Permanent link">¶</a></h3>
<p>If we don’t set a value explicitly to the <code>file=</code> argument, the output will be sent to the standard
output. There’s a small note to that point to be observed. In reality, the output will be sent to
the <code>sys.stdout</code> file object. Usually, these two are the same. But, of course, we can set
<code>sys.stdout</code> to something else.</p>
<p>Consider the following script which changes the value of <code>sys.stdout</code>, prints something, and then
restores the value of <code>sys.stdout</code> to its original value.</p>
<div class="hl"><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span><span>9
</span></pre><pre class=content><code><span><span class="kn">import</span> <span class="nn">sys</span>
</span><span>
</span><span><span class="n">original_stdout</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">stdout</span>
</span><span><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">"out.txt"</span><span class="p">,</span> <span class="s2">"w"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
</span><span> <span class="n">sys</span><span class="o">.</span><span class="n">stdout</span> <span class="o">=</span> <span class="n">f</span>
</span><span> <span class="nb">print</span><span class="p">(</span><span class="s2">"trololololol"</span><span class="p">)</span>
</span><span>
</span><span><span class="n">sys</span><span class="o">.</span><span class="n">stdout</span> <span class="o">=</span> <span class="n">original_stdout</span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="s2">"restored"</span><span class="p">)</span>
</span></code></pre></div>
<p>If we run this script, we’ll only see <code>restored</code> in the output, but the file <code>out.txt</code> will be
created with the output from line 6.</p>
<p>A minor point to note here is that it’s probably incorrect to say <em>“the default value of the <code>file</code>
argument is <code>sys.stdout</code>“</em>. Since if that were the case, changing the value of <code>sys.stdout</code> should
not affect the <code>print</code> function. Instead, I believe its default value is <code>None</code> and in that case,
<code>print</code> uses the current value of <code>sys.stdout</code>.</p>
<p>We can verify this by explicitly passing in <code>None</code> to the <code>file=</code> argument:</p>
<div class="hl"><pre class=content><code><span><span class="kn">import</span> <span class="nn">sys</span>
</span><span>
</span><span><span class="n">original_stdout</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">stdout</span>
</span><span><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">"out.txt"</span><span class="p">,</span> <span class="s2">"w"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
</span><span> <span class="n">sys</span><span class="o">.</span><span class="n">stdout</span> <span class="o">=</span> <span class="n">f</span>
</span><span> <span class="nb">print</span><span class="p">(</span><span class="s2">"trololololol"</span><span class="p">,</span> <span class="n">file</span><span class="o">=</span><span class="kc">None</span><span class="p">)</span>
</span><span>
</span><span>
</span><span><span class="n">sys</span><span class="o">.</span><span class="n">stdout</span> <span class="o">=</span> <span class="n">original_stdout</span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="s2">"restored"</span><span class="p">,</span> <span class="n">file</span><span class="o">=</span><span class="kc">None</span><span class="p">)</span>
</span></code></pre></div>
<p>The above script produces the exact same output as when we didn’t provide the <code>file=</code> argument
explicitly.</p>
<h3 id="collecting-with-iostringio">Collecting with <code>io.StringIO</code><a class="headerlink" href="#collecting-with-iostringio" title="Permanent link">¶</a></h3>
<p>The <a href="https://docs.python.org/3/library/io.html#io.StringIO" rel="noopener noreferrer" target="_blank"><code>io.StringIO</code></a> can be used to create a file object that collects all that is
written to it, and then get it all out as a string. This is useful when calling a function that
prints information using the <code>print</code> function, but instead, we want that output as a string for
further processing. We can replace <code>sys.stdout</code> with a <code>io.StringIO</code> instance before calling that
function, and then restore it after. Here’s how this might look like:</p>
<div class="hl"><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span><span>9
</span><span>10
</span><span>11
</span><span>12
</span><span>13
</span><span>14
</span><span>15
</span></pre><pre class=content><code><span><span class="kn">import</span> <span class="nn">io</span><span class="o">,</span> <span class="nn">sys</span>
</span><span>
</span><span><span class="k">def</span> <span class="nf">print_product</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
</span><span> <span class="nb">print</span><span class="p">(</span><span class="n">a</span> <span class="o">*</span> <span class="n">b</span><span class="p">)</span>
</span><span>
</span><span>
</span><span><span class="n">original_stdout</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">stdout</span>
</span><span><span class="n">string_io</span> <span class="o">=</span> <span class="n">io</span><span class="o">.</span><span class="n">StringIO</span><span class="p">()</span>
</span><span>
</span><span><span class="n">sys</span><span class="o">.</span><span class="n">stdout</span> <span class="o">=</span> <span class="n">string_io</span>
</span><span><span class="n">print_product</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
</span><span><span class="n">sys</span><span class="o">.</span><span class="n">stdout</span> <span class="o">=</span> <span class="n">original_stdout</span>
</span><span>
</span><span><span class="n">result</span> <span class="o">=</span> <span class="n">string_io</span><span class="o">.</span><span class="n">getvalue</span><span class="p">()</span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="s2">"Result is"</span><span class="p">,</span> <span class="n">result</span><span class="p">)</span>
</span></code></pre></div>
<p>In this script, the <code>print_product</code> function prints the result of the multiplication, instead of
returning it. So to get the result out of it, we replace <code>sys.stdout</code> with a <code>io.StringIO</code> instance
and after calling the <code>print_product</code> function, we get the printed result using the <code>.getvalue()</code>
method.</p>
<p>However, note that a similar operation with binary data using <code>io.BytesIO</code> is not possible, since
the <code>print</code> function converts all its argument to text before writing to the file.</p>
<h2 id="the-end-keyword-argument">The <code>end=</code> keyword argument<a class="headerlink" href="#the-end-keyword-argument" title="Permanent link">¶</a></h2>
<p>This is like one of those things that we notice only when it’s taken away. The <code>print</code> function
appends a newline at the end of the last argument to be printed. Check out the following example:</p>
<div class="hl"><pre class=content><code><span><span class="nb">print</span><span class="p">(</span><span class="s2">"hello on day 1"</span><span class="p">)</span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="s2">"yeah right on day 2"</span><span class="p">)</span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="s2">"oh to hell with you on day 3"</span><span class="p">)</span>
</span></code></pre></div>
<p>The output of this script is the following:</p>
<div class="hl"><pre class=content><code><span>hello on day 1
</span><span>yeah right on day 2
</span><span>oh to hell with you on day 3
</span></code></pre></div>
<p>The output from the three <code>print</code> calls shows up in three separate lines, nice and neat. But we
never gave a <code>"\n"</code> in our calls to <code>print</code>. It comes from the default value of the <code>end=</code> argument
of the <code>print</code> function. If we set the <code>end=</code> argument to something else, it will replace the
newline in the end of the output from a <code>print</code> call.</p>
<p>Check out the following script for example:</p>
<div class="hl"><pre class=content><code><span><span class="nb">print</span><span class="p">(</span><span class="s2">"Doing awesome stuff... "</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s2">""</span><span class="p">)</span>
</span><span><span class="c1"># do awesome stuff here</span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="s2">"done"</span><span class="p">)</span>
</span></code></pre></div>
<p>This script prints the following output:</p>
<div class="hl"><pre class=content><code><span>Doing awesome stuff... done
</span></code></pre></div>
<p>The output of the two <code>print</code> calls shows up on the same line, since we suppressed the newline that
would’ve been printed from the first call to <code>print</code>, by setting the <code>end=</code> argument to an empty
string. The second call to <code>print</code> will continue this sentence and finish the line by adding a
newline at the end.</p>
<h2 id="a-note-about-python-2">A Note about Python 2<a class="headerlink" href="#a-note-about-python-2" title="Permanent link">¶</a></h2>
<p>Python 2 had a <code>print</code> <strong>statement</strong>, which worked similar to the <code>print</code> <strong>function</strong> in Python 3,
but is not as feature-rich. Additionally, being a statement, it couldn’t be used in all the places,
for example, within in a lambda expression.</p>
<p>However, Python 2.6 introduced a <a href="https://docs.python.org/2/library/__future__.html" rel="noopener noreferrer" target="_blank">future import</a> that brought the <code>print</code> function to Python 2.
Adding a <code>from __future__ import print_function</code> line at the start of a Python 2 file would disable
the <code>print</code> statement in that file and turn <code>print</code> into a function. This can be very useful for
when migrating to Python 3.</p>
<h2 id="a-sad-imitation">A Sad Imitation<a class="headerlink" href="#a-sad-imitation" title="Permanent link">¶</a></h2>
<p>Here’s a sad little imitation of the <code>print</code> function that should behave similar to the builtin in
most of the features that have been discussed in this article:</p>
<div class="hl"><pre class=content><code><span><span class="kn">import</span> <span class="nn">sys</span>
</span><span>
</span><span><span class="k">def</span> <span class="nf">sad_print</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s2">" "</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s2">"</span><span class="se">\n</span><span class="s2">"</span><span class="p">,</span> <span class="n">file</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
</span><span> <span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">stdout</span> <span class="k">if</span> <span class="n">file</span> <span class="ow">is</span> <span class="kc">None</span> <span class="k">else</span> <span class="n">file</span><span class="p">)</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">sep</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="nb">str</span><span class="p">,</span> <span class="n">args</span><span class="p">))</span> <span class="o">+</span> <span class="n">end</span><span class="p">)</span>
</span><span>
</span><span>
</span><span><span class="n">sad_print</span><span class="p">(</span><span class="s2">"the answer is"</span><span class="p">,</span> <span class="mi">42</span><span class="p">)</span>
</span></code></pre></div>
<p>In this <code>sad_print</code> function, what we are essentially doing is:</p>
<ol>
<li>Pick <code>sys.stdout</code> if <code>file</code> is <code>None</code>.</li>
<li>Call <code>str</code> on all of the provided arguments.</li>
<li>Join the results of the calls to <code>str</code> using the value of <code>sep</code>.</li>
<li>Concatenate the value of <code>end</code> to the result of above step.</li>
<li>Call write on result of point-1, with the result of the above step.</li>
</ol>
<p>I’m sure the <code>print</code> builtin does quite a bit more than just this one-liner, but doing this can give
us some perspective of how all the different pieces fit in together.</p>
<h2 id="the-pprint-function">The <code>pprint</code> Function<a class="headerlink" href="#the-pprint-function" title="Permanent link">¶</a></h2>
<p>Python’s standard library has a <a href="https://docs.python.org/3/library/pprint.html" rel="noopener noreferrer" target="_blank"><code>pprint</code></a> module, with a <code>pprint</code> function that takes one
argument, and prints it <em>prettily</em>.</p>
<p>For example, consider the following script:</p>
<div class="hl"><pre class=content><code><span><span class="kn">from</span> <span class="nn">pprint</span> <span class="kn">import</span> <span class="n">pprint</span>
</span><span>
</span><span><span class="n">numbers</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">]</span>
</span><span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span>
</span><span><span class="n">pprint</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span>
</span><span>
</span><span><span class="n">planets</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"Mercury"</span><span class="p">,</span> <span class="s2">"Venus"</span><span class="p">,</span> <span class="s2">"Earth"</span><span class="p">,</span> <span class="s2">"Mars"</span><span class="p">,</span> <span class="s2">"Jupiter"</span><span class="p">,</span> <span class="s2">"Saturn"</span><span class="p">,</span> <span class="s2">"Uranus"</span><span class="p">,</span> <span class="s2">"Neptune"</span><span class="p">,</span> <span class="s2">"Pluto"</span><span class="p">]</span>
</span><span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="n">planets</span><span class="p">)</span>
</span><span><span class="n">pprint</span><span class="p">(</span><span class="n">planets</span><span class="p">)</span>
</span></code></pre></div>
<p>We are calling <code>print</code> and <code>pprint</code> on the same list of strings. Let’s look at the output:</p>
<div class="hl"><pre class=content><code><span>[1, 2, 3, 4, 5, 6]
</span><span>[1, 2, 3, 4, 5, 6]
</span><span>['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune', 'Pluto']
</span><span>['Mercury',
</span><span> 'Venus',
</span><span> 'Earth',
</span><span> 'Mars',
</span><span> 'Jupiter',
</span><span> 'Saturn',
</span><span> 'Uranus',
</span><span> 'Neptune',
</span><span> 'Pluto']
</span></code></pre></div>
<p>As we can see, the output from <code>pprint</code> is prettified, but only if necessary. In the first case,
where we were printing just six numbers, the output was fine as a single line so <code>pprint</code> did not
cut it up into several lines. But in the second case, the line ends up too long and it may not be
comfortable on small terminal screens. So, it cuts it up.</p>
<p>The <code>pprint</code> module can be useful to prettily print (or formatting) lists and dictionaries. Check
out its official documentation for more information.</p>
<h2 id="conclusion">Conclusion<a class="headerlink" href="#conclusion" title="Permanent link">¶</a></h2>
<p>We may not use all these features of the <code>print</code> functions all the time, but I think it’s useful to
know that <code>print</code> is not just a function that prints the given string. It’s quite a bit more than
that; and when we need it, it’s there without having to import anything. Thank you for reading!</p>Dependency Injection In Python2020-03-29T00:00:00+05:302020-03-29T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2020-03-29:/posts/dependency-injection-in-python/<p>I chose a form of dependency injection (DI) as the solution to a recent problem I needed to solve.
This is a quick write-up of how I did it in Python, using the standard library modules.</p>
<div class="toc"><span class="toctitle">Table of Contents</span><ul>
<li><a href="#the-problem">The Problem</a></li>
<li><a href="#the-legacy-solution">The Legacy Solution</a></li>
<li><a href="#the-new-di-solution">The New DI Solution</a></li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
</div>
<h2 id="the-problem">The …</h2><p>I chose a form of dependency injection (DI) as the solution to a recent problem I needed to solve.
This is a quick write-up of how I did it in Python, using the standard library modules.</p>
<div class="toc"><span class="toctitle">Table of Contents</span><ul>
<li><a href="#the-problem">The Problem</a></li>
<li><a href="#the-legacy-solution">The Legacy Solution</a></li>
<li><a href="#the-new-di-solution">The New DI Solution</a></li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
</div>
<h2 id="the-problem">The Problem<a class="headerlink" href="#the-problem" title="Permanent link">¶</a></h2>
<p>I’ll be illustrating the problem in a slightly different context, so I don’t derail too much into
the subject, which would just be a distraction. So, if the problem feels unrealistic or stupid,
that’s just a result of my unimaginative thinking.</p>
<p>We have an application, let’s call it the task runner, that lets users choose a task to run, and
runs that task. Each such task is implemented as a separate Python script file, that take no user
inputs, but do connect to a database and a few REST endpoints.</p>
<p>So this is how it works. We have a bunch of wrapper classes that provide high-level abstractions for
the database and the REST endpoints. Instances of these classes are given to the task scripts, which
use them to perform their task.</p>
<p>The task scripts are also expected to <em>return</em> some information back to our application, with
details such as whether the task was successful or the reasons if there’s an error etc. The approach
for how this is done is detailed in the following Solution section(s).</p>
<h2 id="the-legacy-solution">The Legacy Solution<a class="headerlink" href="#the-legacy-solution" title="Permanent link">¶</a></h2>
<p>The current way this is working (which I took the liberty to call <em>The Legacy Solution</em>) is that
when a user requests for a specific task to be run, the application reads up the relevant Python
script file and calls <code>eval</code> on the contents. A pre-made dictionary holding all the instances of
high-level abstractions is provided in the global scope of this call to <code>eval</code>.</p>
<p>This has been working well for several years now and, although it feels dirty in hindsight, there
were probably good reasons it was done this way:</p>
<ol>
<li>It was very simple and easy to implement. There’s little to no magic.</li>
<li>The task scripts can be updated on production without restarting the application and the changes
would take effect immediately.</li>
<li>The scripts’ logic can be written as module level code. Full freedom on how the code is
structured and written.</li>
</ol>
<p>Arguing on how horrible this approach is would be a great topic for a heated debate, and,
fortunately that’s not what I set to write about here. This simple method, while worked, didn’t
scale with the team. We soon decided to move to a more sophisticated approach and so started
looking.</p>
<p>A major reason (among several) for this decision was to have the scripts not depend on implicit
globals. The use of implicit globals meant that the scripts were using variables that appear as not
defined to static code analyzers. Additionally, since the script file was being read into a string
and <code>eval</code>-ed, the stack trace from any errors were not very helpful.</p>
<h2 id="the-new-di-solution">The New DI Solution<a class="headerlink" href="#the-new-di-solution" title="Permanent link">¶</a></h2>
<p>In the new proposed way for this to work, we have made three critical changes:</p>
<ol>
<li>The Python script files will be <code>import</code>-ed as Python modules, and the <code>task_main</code> function at
the module level will be called to run the task.</li>
<li>Nothing is implicitly injected into the script’s global scope.</li>
<li>Access to the API abstractions is done through a form of dependency injection.</li>
</ol>
<p>In the task scripts, we have a function defined like the following:</p>
<div class="hl"><pre class=content><code><span><span class="k">def</span> <span class="nf">task_main</span><span class="p">(</span><span class="n">users_service</span><span class="p">,</span> <span class="n">sales_service</span><span class="p">):</span>
</span><span> <span class="c1"># do something with `users_service` and `sales_service`.</span>
</span></code></pre></div>
<p>Here, the <code>task_main</code> function is defined to accept two arguments. The <code>users_service</code> and
<code>sales_service</code>. In our task runner application, we use the <code>inspect</code> module to identify the
abstractions being used in <code>task_main</code> and pass them accordingly. Here’s how it works:</p>
<div class="hl"><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span><span>9
</span><span>10
</span><span>11
</span><span>12
</span><span>13
</span><span>14
</span><span>15
</span><span>16
</span></pre><pre class=content><code><span><span class="kn">import</span> <span class="nn">importlib</span>
</span><span><span class="kn">import</span> <span class="nn">inspect</span>
</span><span>
</span><span><span class="k">def</span> <span class="nf">run_task_script</span><span class="p">(</span><span class="n">script</span><span class="p">):</span>
</span><span> <span class="n">module_name</span> <span class="o">=</span> <span class="n">script</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s1">'.py'</span><span class="p">,</span> <span class="s1">''</span><span class="p">)</span>
</span><span> <span class="n">module</span> <span class="o">=</span> <span class="n">importlib</span><span class="o">.</span><span class="n">import_module</span><span class="p">(</span><span class="n">module_name</span><span class="p">)</span>
</span><span> <span class="n">args</span> <span class="o">=</span> <span class="n">inspect</span><span class="o">.</span><span class="n">signature</span><span class="p">(</span><span class="n">module</span><span class="o">.</span><span class="n">task_main</span><span class="p">)</span><span class="o">.</span><span class="n">parameters</span>
</span><span>
</span><span> <span class="n">kwargs</span> <span class="o">=</span> <span class="p">{}</span>
</span><span>
</span><span> <span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">args</span><span class="p">:</span>
</span><span> <span class="n">kwargs</span><span class="p">[</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="n">get_service_instance</span><span class="p">(</span><span class="n">name</span><span class="p">)</span>
</span><span>
</span><span> <span class="n">response</span> <span class="o">=</span> <span class="n">module</span><span class="o">.</span><span class="n">task_main</span><span class="p">(</span><span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
</span><span>
</span><span> <span class="n">record_task_response</span><span class="p">(</span><span class="n">response</span><span class="p">)</span>
</span></code></pre></div>
<p>In this function, we first convert the script file name into it’s module name (hoping it doesn’t
contain any spaces or dash characters). Then, we use the <code>importlib</code> module to import the module of
that name. Next, we call <code>inspect.signature</code> function on the module’s <code>task_main</code> function to get
its parameter names.</p>
<p>Based on these argument names (in <code>args</code>), we then construct a dictionary with these names as keys
and the instance of the API abstraction class, as the value. We then pass this as the keyword
arguments to the call to <code>module.task_main</code>.</p>
<p>In this way, the scripts don’t assume any implicit globals and the <code>task_main</code> accepts arguments
that it needs and no more. This makes the code much cleaner and easier to do static analysis on.
Besides, since we import the module and call a function in it, we get nicer stack traces when
there’s an exception.</p>
<h2 id="conclusion">Conclusion<a class="headerlink" href="#conclusion" title="Permanent link">¶</a></h2>
<p>I’m sure there’s better, and more involved implementations of doing DI in Python, but what we’ve
done above is enough for the target problem. Additionally, it’s just using the standard library, so,
extra brownie points for that!</p>The Magic of AutoHotkey — Part 22020-03-22T00:00:00+05:302020-03-22T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2020-03-22:/posts/the-magic-of-autohotkey-2/<p>In the previous part of <a href="../the-magic-of-autohotkey/">The Magic of AutoHotkey</a>, we looked at automating small
pieces of routine tasks with various applications, as well as identifying things that could be done
better with a quick hotkey. This is the next chapter of the story. In this article, I’ll show you …</p><p>In the previous part of <a href="../the-magic-of-autohotkey/">The Magic of AutoHotkey</a>, we looked at automating small
pieces of routine tasks with various applications, as well as identifying things that could be done
better with a quick hotkey. This is the next chapter of the story. In this article, I’ll show you
how I tamed the stock file explorer as well as connecting to office applications with OLE to provide
additional rich functionality.</p>
<p>This article is part of a series:</p>
<ol>
<li><a href="../the-magic-of-autohotkey/">Part 1</a></li>
<li>Part 2 (this article).</li>
</ol>
<div class="toc"><span class="toctitle">Table of Contents</span><ul>
<li><a href="#file-explorer-magic">File Explorer Magic</a><ul>
<li><a href="#focus-location-editor">Focus Location Editor</a></li>
<li><a href="#open-command-window">Open Command Window</a></li>
<li><a href="#folder-shortcuts">Folder Shortcuts</a></li>
<li><a href="#better-hotkeys-for-directional-navigation">Better Hotkeys for Directional Navigation</a></li>
<li><a href="#select-files-by-pattern">Select Files by Pattern</a></li>
<li><a href="#batch-rename">Batch Rename</a></li>
<li><a href="#copy-paths-of-selected-files">Copy Paths of Selected Files</a></li>
<li><a href="#copy-contents-of-selected-files">Copy Contents of Selected Files</a></li>
<li><a href="#create-file-with-clipboard-contents">Create File with Clipboard Contents</a></li>
<li><a href="#create-folder-hierarchy-and-enter-it">Create Folder Hierarchy and Enter it</a></li>
</ul>
</li>
<li><a href="#email-selected-files-with-outlook">Email Selected File(s) with Outlook</a><ul>
<li><a href="#global-hotkey-for-new-mail">Global Hotkey for New Mail</a></li>
</ul>
</li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
</div>
<h2 id="file-explorer-magic">File Explorer Magic<a class="headerlink" href="#file-explorer-magic" title="Permanent link">¶</a></h2>
<p>The file explorer is probably my most used application during work. Yet, it doesn’t feel like it’s
tuned for a power user. Maybe that’s also why <a href="https://www.xyplorer.com/" rel="noopener noreferrer" target="_blank">there’s</a> <a href="https://www.zabkat.com/" rel="noopener noreferrer" target="_blank">so</a> <a href="https://www.gpsoft.com.au/" rel="noopener noreferrer" target="_blank">many</a>
<a href="https://www.ghisler.com/" rel="noopener noreferrer" target="_blank">alternatives</a> to file explorers. I’ve tried a few of them in the past, but the best has
been to add exactly the few things I needed in the native file explorer, using AutoHotkey. I’ll run
through those here.</p>
<p>As is the case in the previous part, I have a module called <code>file-explorer-tweaks.ahk</code> which is
<code>#Include</code>-ed in my master script.</p>
<p>To start, we define a window group, which includes all file explorer windows. We later use this
group to define hotkeys that we want to work <em>only</em> on the file explorer windows.</p>
<div class="hl"><pre class=content><code><span><span class="nb">GroupAdd</span><span class="p">,</span> <span class="n">FileListers</span><span class="p">,</span> <span class="n">ahk_class</span> <span class="n">CabinetWClass</span>
</span><span><span class="nb">GroupAdd</span><span class="p">,</span> <span class="n">FileListers</span><span class="p">,</span> <span class="n">ahk_class</span> <span class="n">WorkerW</span>
</span><span><span class="nb">GroupAdd</span><span class="p">,</span> <span class="n">FileListers</span><span class="p">,</span> <span class="n">ahk_class</span> <span class="n">#32770</span><span class="p">,</span> <span class="n">ShellView</span>
</span></code></pre></div>
<p>This group now matches the file explorer windows, desktop and the file open dialog windows.</p>
<h3 id="focus-location-editor">Focus Location Editor<a class="headerlink" href="#focus-location-editor" title="Permanent link">¶</a></h3>
<p>Almost all the web browsers today have the default hotkey <kbd>^l</kbd> which focuses the location
bar, and selects everything in it. But in the file explorer, this is <kbd>!d</kbd>. Habits rule and
I constantly hit <kbd>^l</kbd> in the file explorer window when I wanted to change something in the
location bar. Obviously, it didn’t work, and it would drive me crazy. Until I added the following to
save me from insanity:</p>
<div class="hl"><pre class=content><code><span><span class="nb">#IfWinActive</span> <span class="n">ahk_group</span> <span class="n">FileListers</span>
</span><span><span class="o">^</span><span class="n">l</span><span class="o">::</span><span class="n">SendInput</span> <span class="o">!</span><span class="n">d</span>
</span></code></pre></div>
<p>While this works fine on the face of it, if I hit <kbd>Escape</kbd> after focusing the location bar
like this, the focus is not returned to the file list. I haven’t figured out a solution to that yet,
so that one’s open.</p>
<h3 id="open-command-window">Open Command Window<a class="headerlink" href="#open-command-window" title="Permanent link">¶</a></h3>
<p>The file explorer has a nice less-known trick. If I right-click without any files selected and with
the <kbd>Shift</kbd> key held down, I get an extra option in the context menu, called “Open command
window here”. Clicking on that menu item will open a new command prompt window in the current
directory. This is extremely convenient if you need the command window often (which you might,
especially if you’re a software developer).</p>
<p>But this needed the mouse. I wanted to do this with the keyboard. Turns out it’s easier than one
might think:</p>
<div class="hl"><pre class=content><code><span><span class="nb">#IfWinActive</span> <span class="n">ahk_group</span> <span class="n">FileListers</span>
</span><span><span class="o">^!</span><span class="n">t</span><span class="o">::</span><span class="n">SendInput</span> <span class="o">!</span><span class="n">dcmd</span><span class="p">{</span><span class="n">Enter</span><span class="p">}</span>
</span></code></pre></div>
<p>Here, we define the <kbd>^!t</kbd> hotkey which will focus the location bar and type in <code>cmd</code> and
hit the <kbd>Enter</kbd> key. This will actually open up a command window <em>in the current
directory</em>.</p>
<h3 id="folder-shortcuts">Folder Shortcuts<a class="headerlink" href="#folder-shortcuts" title="Permanent link">¶</a></h3>
<p>Folder shortcuts is where I define a hotkey that will navigate to a specific directory, always. For
example, while in a file explorer, hitting <kbd>^h</kbd> should navigate to the home folder, hitting
<kbd>^j</kbd> should navigate to the Downloads folder (this key opens the downloads view in web
browsers, see what I did there?).</p>
<div class="hl"><pre class=content><code><span><span class="nb">#IfWinActive</span> <span class="n">ahk_group</span> <span class="n">FileListers</span>
</span><span><span class="o">^</span><span class="n">h</span><span class="o">::</span><span class="n">Send</span> <span class="o">!</span><span class="n">d</span><span class="nv">%homedir%</span><span class="p">{</span><span class="n">Enter</span><span class="p">}</span>
</span><span><span class="o">^</span><span class="n">j</span><span class="o">::</span><span class="n">Send</span> <span class="o">!</span><span class="n">d</span><span class="nv">%homedir%</span>\<span class="n">Downloads</span><span class="p">{</span><span class="n">Enter</span><span class="p">}</span>
</span><span><span class="o">^</span><span class="n">y</span><span class="o">::</span><span class="n">Send</span> <span class="o">!</span><span class="n">dLibraries</span>\<span class="n">Documents</span><span class="p">{</span><span class="n">enter</span><span class="p">}</span>
</span><span><span class="o">^</span><span class="n">k</span><span class="o">::</span><span class="n">Send</span> <span class="o">!</span><span class="n">dC</span><span class="o">:</span>\<span class="n">work</span><span class="p">{</span><span class="n">Enter</span><span class="p">}</span>
</span><span><span class="o">^</span><span class="n">t</span><span class="o">::</span><span class="n">Send</span> <span class="o">!</span><span class="n">dC</span><span class="o">:</span>\<span class="n">tools</span><span class="p">{</span><span class="n">Enter</span><span class="p">}</span>
</span><span><span class="o">^</span><span class="n">b</span><span class="o">::</span><span class="n">Send</span> <span class="o">!</span><span class="n">dC</span><span class="o">:</span>\<span class="n">labs</span><span class="p">{</span><span class="n">Enter</span><span class="p">}</span>
</span></code></pre></div>
<p>This snippet uses the <code>homedir</code> variable defined in the <a href="../the-magic-of-autohotkey/#the-setup">previous article</a>.</p>
<p>On the face of it, these are very simple hotkeys. We pass <kbd>!d</kbd> to focus the location input
and type in the location where we want to go to. Simple & effective. They serve sort of like quick
access bookmarks and are probably my most used hotkeys defined with AutoHotkey overall, by a margin.</p>
<h3 id="better-hotkeys-for-directional-navigation">Better Hotkeys for Directional Navigation<a class="headerlink" href="#better-hotkeys-for-directional-navigation" title="Permanent link">¶</a></h3>
<p>In the previous section, we dealt with navigating to absolution locations. But how about directional
navigation, where we want to go back or forward or even up the directory chain?</p>
<p>The default hotkeys for this leverage the arrow keys, which require taking my hands off the
keyboard’s home row. So, I’m using the following keys for these three operations, which are inspired
by similar behavior in Vim (again!).</p>
<div class="hl"><pre class=content><code><span><span class="c1">; Navigate with the keyboard better!</span>
</span><span><span class="nb">#IfWinActive</span> <span class="n">ahk_group</span> <span class="n">FileListers</span>
</span><span><span class="o">^</span><span class="n">o</span><span class="o">::</span><span class="n">SendInput</span><span class="p">,</span> <span class="o">!</span><span class="p">{</span><span class="n">Left</span><span class="p">}</span>
</span><span><span class="o">^</span><span class="n">i</span><span class="o">::</span><span class="n">SendInput</span><span class="p">,</span> <span class="o">!</span><span class="p">{</span><span class="n">Right</span><span class="p">}</span>
</span><span><span class="o">^</span><span class="n">u</span><span class="o">::</span><span class="n">SendInput</span><span class="p">,</span> <span class="o">!</span><span class="p">{</span><span class="n">Up</span><span class="p">}</span>
</span></code></pre></div>
<p>To top it, I have also defined mouse “hotkeys” for these three actions. I rarely use these nowadays,
but they’re still there for when I already have a hand on the mouse.</p>
<div class="hl"><pre class=content><code><span><span class="c1">; Navigate with the mouse!</span>
</span><span><span class="nb">#IfWinActive</span> <span class="n">ahk_group</span> <span class="n">FileListers</span>
</span><span><span class="o">!</span><span class="n">WheelUp</span><span class="o">::</span><span class="n">SendInput</span><span class="p">,</span> <span class="o">!</span><span class="p">{</span><span class="n">Up</span><span class="p">}</span>
</span><span><span class="o">^</span><span class="n">WheelUp</span><span class="o">::</span><span class="n">SendInput</span><span class="p">,</span> <span class="o">!</span><span class="p">{</span><span class="n">Left</span><span class="p">}</span>
</span><span><span class="o">^</span><span class="n">WheelDown</span><span class="o">::</span><span class="n">SendInput</span><span class="p">,</span> <span class="o">!</span><span class="p">{</span><span class="n">Right</span><span class="p">}</span>
</span></code></pre></div>
<p>Pretty self-explanatory really.</p>
<h3 id="select-files-by-pattern">Select Files by Pattern<a class="headerlink" href="#select-files-by-pattern" title="Permanent link">¶</a></h3>
<p>I particularly love this one. When I trigger this hotkey, a little prompt shows up where I enter a
regular expression and then every file in the current folder that matches this pattern will be
selected. The first time I used this on a folder with ~300 files, I practically had tears in my eyes
at how easy it was to make the file selection by a pattern.</p>
<p>So, here’s the code for this:</p>
<div class="hl"><input type=checkbox id=co-0><label for=co-0><span class='btn show-full-code-btn'>Show remaining 16 lines</span></label><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span><span>9
</span><span>10
</span><span>11
</span><span>12
</span><span>13
</span><span>14
</span><span>15
</span><span>16
</span><span>17
</span><span>18
</span><span>19
</span><span>20
</span><span class=collapse>21
</span><span class=collapse>22
</span><span class=collapse>23
</span><span class=collapse>24
</span><span class=collapse>25
</span><span class=collapse>26
</span><span class=collapse>27
</span><span class=collapse>28
</span><span class=collapse>29
</span><span class=collapse>30
</span><span class=collapse>31
</span><span class=collapse>32
</span><span class=collapse>33
</span><span class=collapse>34
</span><span class=collapse>35
</span><span class=collapse>36
</span></pre><pre class=content><code><span><span class="c1">; Get selected files in explorer and more:</span>
</span><span><span class="c1">; http://www.autohotkey.com/board/topic/60985-get-paths-of-selected-items-in-an-explorer-window/</span>
</span><span><span class="nb">#IfWinActive</span> <span class="n">ahk_group</span> <span class="n">FileListers</span>
</span><span><span class="o">^</span><span class="n">s</span><span class="o">::</span>
</span><span><span class="n">SelectByRegEx</span><span class="p">()</span> <span class="p">{</span>
</span><span> <span class="nb">static</span> <span class="n">selectionPattern</span> <span class="o">:=</span> <span class="s">""</span>
</span><span> <span class="nb">WinGetPos</span><span class="p">,</span> <span class="n">wx</span><span class="p">,</span> <span class="n">wy</span>
</span><span> <span class="nb">ControlGetPos</span><span class="p">,</span> <span class="n">cx</span><span class="p">,</span> <span class="n">cy</span><span class="p">,</span> <span class="n">cw</span><span class="p">,</span> <span class="p">,</span> <span class="n">DirectUIHWND3</span>
</span><span> <span class="n">x</span> <span class="o">:=</span> <span class="n">wx</span> <span class="o">+</span> <span class="n">cx</span> <span class="o">+</span> <span class="n">cw</span><span class="o">/</span><span class="mi">2</span> <span class="o">-</span> <span class="mi">200</span>
</span><span> <span class="n">y</span> <span class="o">:=</span> <span class="n">wy</span> <span class="o">+</span> <span class="n">cy</span>
</span><span> <span class="nb">InputBox</span><span class="p">,</span> <span class="n">selectionPattern</span><span class="p">,</span> <span class="n">Select</span> <span class="n">by</span> <span class="n">regex</span>
</span><span> <span class="p">,</span> <span class="n">Enter</span> <span class="n">regex</span> <span class="n">pattern</span> <span class="n">to</span> <span class="n">select</span> <span class="n">files</span> <span class="n">that</span> <span class="n">CONTAIN</span> <span class="n">it</span> <span class="p">(</span><span class="n">Empty</span> <span class="n">to</span> <span class="n">select</span> <span class="n">all</span><span class="p">)</span>
</span><span> <span class="p">,</span> <span class="p">,</span> <span class="mi">400</span><span class="p">,</span> <span class="mi">150</span><span class="p">,</span> <span class="nv">%x%</span><span class="p">,</span> <span class="nv">%y%</span><span class="p">,</span> <span class="p">,</span> <span class="p">,</span> <span class="nv">%selectionPattern%</span>
</span><span> <span class="nb">if </span><span class="nv">ErrorLevel</span>
</span><span> <span class="nb">Return</span>
</span><span> <span class="n">for</span> <span class="n">window</span> <span class="ow">in</span> <span class="nf">ComObjCreate</span><span class="p">(</span><span class="s">"Shell.Application"</span><span class="p">)</span><span class="o">.</span><span class="n">Windows</span>
</span><span> <span class="nb">if </span><span class="nf">WinActive</span><span class="p">(</span><span class="s">"ahk_id "</span> <span class="o">.</span> <span class="n">window</span><span class="o">.</span><span class="n">hwnd</span><span class="p">)</span> <span class="p">{</span>
</span><span> <span class="n">pattern</span> <span class="o">:=</span> <span class="s">"S)"</span> <span class="o">.</span> <span class="n">selectionPattern</span>
</span><span> <span class="n">items</span> <span class="o">:=</span> <span class="n">window</span><span class="o">.</span><span class="n">document</span><span class="o">.</span><span class="n">Folder</span><span class="o">.</span><span class="n">Items</span>
</span><span> <span class="n">total</span> <span class="o">:=</span> <span class="n">items</span><span class="o">.</span><span class="n">Count</span><span class="p">()</span>
</span><span class=collapse> <span class="n">i</span> <span class="o">:=</span> <span class="mi">0</span>
</span><span class=collapse> <span class="n">showProgress</span> <span class="o">:=</span> <span class="n">total</span> <span class="o">></span> <span class="mi">160</span>
</span><span class=collapse> <span class="n">if</span> <span class="p">(</span><span class="n">showProgress</span><span class="p">)</span>
</span><span class=collapse> <span class="nb">Progress</span><span class="p">,</span> <span class="n">b</span> <span class="n">w200</span><span class="p">,</span> <span class="p">,</span> <span class="n">Matching</span><span class="o">...</span>
</span><span class=collapse> <span class="n">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">items</span> <span class="p">{</span>
</span><span class=collapse> <span class="n">match</span> <span class="o">:=</span> <span class="nf">RegExMatch</span><span class="p">(</span><span class="n">item</span><span class="o">.</span><span class="n">Name</span><span class="p">,</span> <span class="n">pattern</span><span class="p">)</span> <span class="o">?</span> <span class="mi">17</span> <span class="o">:</span> <span class="mi">0</span>
</span><span class=collapse> <span class="n">window</span><span class="o">.</span><span class="n">document</span><span class="o">.</span><span class="n">SelectItem</span><span class="p">(</span><span class="n">item</span><span class="p">,</span> <span class="n">match</span><span class="p">)</span>
</span><span class=collapse> <span class="n">if</span> <span class="p">(</span><span class="n">showProgress</span><span class="p">)</span> <span class="p">{</span>
</span><span class=collapse> <span class="n">i</span> <span class="o">:=</span> <span class="n">i</span> <span class="o">+</span> <span class="mi">100</span>
</span><span class=collapse> <span class="nb">Progress</span><span class="p">,</span> <span class="o">%</span> <span class="n">i</span> <span class="o">/</span> <span class="n">total</span>
</span><span class=collapse> <span class="p">}</span>
</span><span class=collapse> <span class="p">}</span>
</span><span class=collapse> <span class="nb">Break</span>
</span><span class=collapse> <span class="p">}</span>
</span><span class=collapse> <span class="nb">Progress</span><span class="p">,</span> <span class="n">Off</span>
</span><span class=collapse><span class="p">}</span>
</span></code></pre></div>
<p>The code is not very pretty, but oh well. It works well, and I’d rather not touch it.</p>
<p>Here’s a little <strong>mute</strong> video recording of this at work:</p>
<video controls="" muted="" playsinline="" preload="" src="https://sharats.me/static/autohotkey-select-by-pattern.mp4">Your browser does not support HTML5 video. Here’s <a href="https://sharats.me/static/autohotkey-select-by-pattern.mp4">a link to the video</a>instead.</video>
<h3 id="batch-rename">Batch Rename<a class="headerlink" href="#batch-rename" title="Permanent link">¶</a></h3>
<p>This is actually built to be invoked as a separate AutoHotkey process, not to be <code>#Include</code>-ed into
a master script. That’s because the GUI is slightly more complex than what we’ve seen in previous
sections and I didn’t bother to make it work well as a module.</p>
<div class="hl"><input type=checkbox id=co-1><label for=co-1><span class='btn show-full-code-btn'>Show remaining 81 lines</span></label><div class=filename><span>batch-rename.ahk</span></div><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span><span>9
</span><span>10
</span><span>11
</span><span>12
</span><span>13
</span><span>14
</span><span>15
</span><span>16
</span><span>17
</span><span>18
</span><span>19
</span><span>20
</span><span class=collapse>21
</span><span class=collapse>22
</span><span class=collapse>23
</span><span class=collapse>24
</span><span class=collapse>25
</span><span class=collapse>26
</span><span class=collapse>27
</span><span class=collapse>28
</span><span class=collapse>29
</span><span class=collapse>30
</span><span class=collapse>31
</span><span class=collapse>32
</span><span class=collapse>33
</span><span class=collapse>34
</span><span class=collapse>35
</span><span class=collapse>36
</span><span class=collapse>37
</span><span class=collapse>38
</span><span class=collapse>39
</span><span class=collapse>40
</span><span class=collapse>41
</span><span class=collapse>42
</span><span class=collapse>43
</span><span class=collapse>44
</span><span class=collapse>45
</span><span class=collapse>46
</span><span class=collapse>47
</span><span class=collapse>48
</span><span class=collapse>49
</span><span class=collapse>50
</span><span class=collapse>51
</span><span class=collapse>52
</span><span class=collapse>53
</span><span class=collapse>54
</span><span class=collapse>55
</span><span class=collapse>56
</span><span class=collapse>57
</span><span class=collapse>58
</span><span class=collapse>59
</span><span class=collapse>60
</span><span class=collapse>61
</span><span class=collapse>62
</span><span class=collapse>63
</span><span class=collapse>64
</span><span class=collapse>65
</span><span class=collapse>66
</span><span class=collapse>67
</span><span class=collapse>68
</span><span class=collapse>69
</span><span class=collapse>70
</span><span class=collapse>71
</span><span class=collapse>72
</span><span class=collapse>73
</span><span class=collapse>74
</span><span class=collapse>75
</span><span class=collapse>76
</span><span class=collapse>77
</span><span class=collapse>78
</span><span class=collapse>79
</span><span class=collapse>80
</span><span class=collapse>81
</span><span class=collapse>82
</span><span class=collapse>83
</span><span class=collapse>84
</span><span class=collapse>85
</span><span class=collapse>86
</span><span class=collapse>87
</span><span class=collapse>88
</span><span class=collapse>89
</span><span class=collapse>90
</span><span class=collapse>91
</span><span class=collapse>92
</span><span class=collapse>93
</span><span class=collapse>94
</span><span class=collapse>95
</span><span class=collapse>96
</span><span class=collapse>97
</span><span class=collapse>98
</span><span class=collapse>99
</span><span class=collapse>100
</span><span class=collapse>101
</span></pre><pre class=content><code><span><span class="nb">#NoEnv</span>
</span><span><span class="nb">#NoTrayIcon</span>
</span><span>
</span><span><span class="n">active_hwnd</span> <span class="o">:=</span> <span class="nf">WinActive</span><span class="p">(</span><span class="s">"ahk_class CabinetWClass"</span><span class="p">)</span>
</span><span><span class="n">If</span> <span class="p">(</span><span class="n">active_hwnd</span><span class="p">)</span> <span class="p">{</span>
</span><span> <span class="n">for</span> <span class="n">window</span> <span class="ow">in</span> <span class="nf">ComObjCreate</span><span class="p">(</span><span class="s">"Shell.Application"</span><span class="p">)</span><span class="o">.</span><span class="n">Windows</span>
</span><span> <span class="n">If</span> <span class="p">(</span><span class="n">active_hwnd</span> <span class="o">==</span> <span class="n">window</span><span class="o">.</span><span class="n">hwnd</span><span class="p">)</span> <span class="p">{</span>
</span><span> <span class="n">parent</span> <span class="o">:=</span> <span class="n">uriDecode</span><span class="p">(</span><span class="n">StrReplace</span><span class="p">(</span><span class="n">window</span><span class="o">.</span><span class="n">LocationURL</span><span class="p">,</span> <span class="s">"file:///"</span><span class="p">,</span> <span class="s">""</span><span class="p">,</span> <span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
</span><span> <span class="n">ShowGui</span><span class="p">()</span>
</span><span> <span class="p">}</span>
</span><span><span class="p">}</span>
</span><span>
</span><span><span class="n">ShowGui</span><span class="p">()</span> <span class="p">{</span>
</span><span> <span class="nb">global</span> <span class="n">active_hwnd</span><span class="p">,</span> <span class="n">parent</span><span class="p">,</span> <span class="n">SourcePattern</span><span class="p">,</span> <span class="n">TargetPattern</span><span class="p">,</span> <span class="n">WindowListView</span>
</span><span> <span class="nb">Gui</span><span class="p">,</span> <span class="n">Font</span><span class="p">,</span> <span class="n">s10</span> <span class="n">q5</span><span class="p">,</span> <span class="n">Segoe</span> <span class="n">UI</span>
</span><span> <span class="nb">Gui</span><span class="p">,</span> <span class="n">Margin</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">6</span>
</span><span> <span class="nb">Gui</span><span class="p">,</span> <span class="o">+</span><span class="n">Owner</span><span class="nv">%active_hwnd%</span>
</span><span> <span class="nb">Gui</span><span class="p">,</span> <span class="n">Add</span><span class="p">,</span> <span class="n">Text</span><span class="p">,</span> <span class="p">,</span> <span class="n">Search</span> <span class="n">pattern</span><span class="o">:</span>
</span><span> <span class="nb">Gui</span><span class="p">,</span> <span class="n">Add</span><span class="p">,</span> <span class="n">Edit</span><span class="p">,</span> <span class="n">r1</span> <span class="n">w300</span> <span class="n">vSourcePattern</span> <span class="n">gInputChanged</span> <span class="o">-</span><span class="n">WantReturn</span> <span class="n">X</span><span class="o">+</span><span class="mi">6</span> <span class="n">Section</span>
</span><span> <span class="nb">Gui</span><span class="p">,</span> <span class="n">Add</span><span class="p">,</span> <span class="n">Text</span><span class="p">,</span> <span class="n">X</span><span class="o">+</span><span class="mi">6</span><span class="p">,</span> <span class="n">Full</span> <span class="n">regex</span> <span class="ow">is</span> <span class="n">supported</span>
</span><span class=collapse> <span class="nb">Gui</span><span class="p">,</span> <span class="n">Add</span><span class="p">,</span> <span class="n">Text</span><span class="p">,</span> <span class="n">XM</span><span class="p">,</span> <span class="n">Replacement</span><span class="o">:</span>
</span><span class=collapse> <span class="nb">Gui</span><span class="p">,</span> <span class="n">Add</span><span class="p">,</span> <span class="n">Edit</span><span class="p">,</span> <span class="n">r1</span> <span class="n">w300</span> <span class="n">vTargetPattern</span> <span class="n">gInputChanged</span> <span class="o">-</span><span class="n">WantReturn</span> <span class="n">XS</span> <span class="n">YP</span>
</span><span class=collapse> <span class="nb">Gui</span><span class="p">,</span> <span class="n">Add</span><span class="p">,</span> <span class="n">Text</span><span class="p">,</span> <span class="n">X</span><span class="o">+</span><span class="mi">6</span><span class="p">,</span> <span class="n">Use</span> <span class="n">$1</span><span class="p">,</span> <span class="n">$2</span><span class="p">,</span> <span class="n">$</span><span class="p">{</span><span class="mi">10</span><span class="p">},</span> <span class="n">$</span><span class="p">{</span><span class="n">named</span><span class="p">},</span> <span class="n">$U1</span><span class="p">,</span> <span class="n">$U</span><span class="p">{</span><span class="mi">10</span><span class="p">},</span> <span class="n">$L2</span><span class="p">,</span> <span class="n">$T0</span> <span class="n">etc</span><span class="o">.</span>
</span><span class=collapse> <span class="nb">Gui</span><span class="p">,</span> <span class="n">Add</span><span class="p">,</span> <span class="n">Button</span><span class="p">,</span> <span class="n">Default</span> <span class="n">gDoRename</span> <span class="n">XM</span> <span class="n">w80</span><span class="p">,</span> <span class="n">Apply</span>
</span><span class=collapse> <span class="nb">Gui</span><span class="p">,</span> <span class="n">Add</span><span class="p">,</span> <span class="n">Button</span><span class="p">,</span> <span class="n">gShowHelp</span> <span class="n">X</span><span class="o">+</span><span class="mi">6</span> <span class="n">w80</span><span class="p">,</span> <span class="n">Help</span>
</span><span class=collapse> <span class="nb">Gui</span><span class="p">,</span> <span class="n">Add</span><span class="p">,</span> <span class="n">ListView</span><span class="p">,</span> <span class="n">Grid</span> <span class="n">r12</span> <span class="n">w800</span> <span class="n">vWindowListView</span> <span class="n">XM</span><span class="p">,</span> <span class="n">Replacements</span><span class="o">|</span><span class="n">Current</span> <span class="n">name</span><span class="o">|</span><span class="n">Renamed</span> <span class="n">to</span>
</span><span class=collapse>
</span><span class=collapse> <span class="n">imList</span> <span class="o">:=</span> <span class="nf">IL_Create</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
</span><span class=collapse> <span class="nf">LV_SetImageList</span><span class="p">(</span><span class="n">imList</span><span class="p">)</span>
</span><span class=collapse> <span class="nf">IL_Add</span><span class="p">(</span><span class="n">imList</span><span class="p">,</span> <span class="s">"check.png"</span><span class="p">,</span> <span class="mh">0xFFFFFF</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
</span><span class=collapse> <span class="nf">IL_Add</span><span class="p">(</span><span class="n">imList</span><span class="p">,</span> <span class="s">"error.png"</span><span class="p">,</span> <span class="mh">0xFFFFFF</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
</span><span class=collapse><span class="c1"> ; IL_Add(imList, "shell32.dll", 145)</span>
</span><span class=collapse><span class="c1"> ; IL_Add(imList, "shell32.dll", 234)</span>
</span><span class=collapse>
</span><span class=collapse> <span class="nb">Gui</span><span class="p">,</span> <span class="n">Show</span><span class="p">,</span> <span class="p">,</span> <span class="n">Rename</span> <span class="n">with</span> <span class="n">Regex</span><span class="o">:</span> <span class="nv">%parent%</span>
</span><span class=collapse><span class="p">}</span>
</span><span class=collapse>
</span><span class=collapse><span class="n">InputChanged</span><span class="p">()</span> <span class="p">{</span>
</span><span class=collapse> <span class="nb">global</span> <span class="n">parent</span><span class="p">,</span> <span class="n">SourcePattern</span><span class="p">,</span> <span class="n">TargetPattern</span>
</span><span class=collapse> <span class="nb">GuiControlGet</span><span class="p">,</span> <span class="n">SourcePattern</span>
</span><span class=collapse> <span class="nb">GuiControlGet</span><span class="p">,</span> <span class="n">TargetPattern</span>
</span><span class=collapse> <span class="nf">LV_Delete</span><span class="p">()</span>
</span><span class=collapse> <span class="nb">Loop</span><span class="p">,</span> <span class="n">Files</span><span class="p">,</span> <span class="nv">%parent%</span>\<span class="o">*</span><span class="p">,</span> <span class="n">FD</span>
</span><span class=collapse> <span class="p">{</span>
</span><span class=collapse> <span class="n">toName</span> <span class="o">:=</span> <span class="nf">RegExReplace</span><span class="p">(</span><span class="nv">A_LoopFileName</span><span class="p">,</span> <span class="n">SourcePattern</span><span class="p">,</span> <span class="n">TargetPattern</span><span class="p">,</span> <span class="n">count</span><span class="p">)</span>
</span><span class=collapse> <span class="n">icon</span> <span class="o">:=</span> <span class="mi">1</span>
</span><span class=collapse> <span class="n">If</span> <span class="p">(</span><span class="nv">A_LoopFileName</span> <span class="o">==</span> <span class="n">toName</span><span class="p">)</span>
</span><span class=collapse> <span class="n">icon</span> <span class="o">:=</span> <span class="mi">3</span>
</span><span class=collapse> <span class="nb">Else</span> <span class="n">if</span> <span class="p">(</span><span class="nf">FileExist</span><span class="p">(</span><span class="n">parent</span> <span class="o">.</span> <span class="s">"/"</span> <span class="o">.</span> <span class="n">toName</span><span class="p">))</span>
</span><span class=collapse> <span class="n">icon</span> <span class="o">:=</span> <span class="mi">2</span>
</span><span class=collapse> <span class="nf">LV_Add</span><span class="p">(</span><span class="s">"Icon"</span> <span class="o">.</span> <span class="n">icon</span><span class="p">,</span> <span class="n">count</span><span class="p">,</span> <span class="nv">A_LoopFileName</span><span class="p">,</span> <span class="n">toName</span><span class="p">)</span>
</span><span class=collapse> <span class="p">}</span>
</span><span class=collapse> <span class="nf">LV_ModifyCol</span><span class="p">()</span>
</span><span class=collapse><span class="p">}</span>
</span><span class=collapse>
</span><span class=collapse><span class="n">DoRename</span><span class="p">()</span> <span class="p">{</span>
</span><span class=collapse> <span class="nb">global</span> <span class="n">parent</span><span class="p">,</span> <span class="n">SourcePattern</span><span class="p">,</span> <span class="n">TargetPattern</span>
</span><span class=collapse> <span class="nb">Gui</span><span class="p">,</span> <span class="n">Submit</span>
</span><span class=collapse>
</span><span class=collapse> <span class="n">If</span> <span class="p">(</span><span class="n">SourcePattern</span> <span class="o">!=</span> <span class="s">""</span><span class="p">)</span>
</span><span class=collapse> <span class="nb">Loop</span> <span class="nv">%parent%</span>\<span class="o">*</span> <span class="p">{</span>
</span><span class=collapse> <span class="n">toName</span> <span class="o">:=</span> <span class="nf">RegExReplace</span><span class="p">(</span><span class="nv">A_LoopFileName</span><span class="p">,</span> <span class="n">SourcePattern</span><span class="p">,</span> <span class="n">TargetPattern</span><span class="p">)</span>
</span><span class=collapse> <span class="nb">FileMove</span><span class="p">,</span> <span class="nv">%parent%</span>\<span class="nv">%A_LoopFileName%</span><span class="p">,</span> <span class="nv">%parent%</span>\<span class="nv">%toName%</span>
</span><span class=collapse> <span class="p">}</span>
</span><span class=collapse>
</span><span class=collapse> <span class="n">GuiClose</span><span class="p">()</span>
</span><span class=collapse><span class="p">}</span>
</span><span class=collapse>
</span><span class=collapse><span class="n">GuiEscape</span><span class="p">()</span> <span class="p">{</span>
</span><span class=collapse> <span class="n">GuiClose</span><span class="p">()</span>
</span><span class=collapse><span class="p">}</span>
</span><span class=collapse>
</span><span class=collapse><span class="n">GuiClose</span><span class="p">()</span> <span class="p">{</span>
</span><span class=collapse> <span class="nb">ExitApp</span>
</span><span class=collapse><span class="p">}</span>
</span><span class=collapse>
</span><span class=collapse><span class="n">uriDecode</span><span class="p">(</span><span class="n">str</span><span class="p">)</span> <span class="p">{</span>
</span><span class=collapse> <span class="nb">Loop</span>
</span><span class=collapse> <span class="nb">If </span><span class="nf">RegExMatch</span><span class="p">(</span><span class="n">str</span><span class="p">,</span> <span class="s">"i)(?<=%)[\da-f]{1,2}"</span><span class="p">,</span> <span class="n">hex</span><span class="p">)</span>
</span><span class=collapse> <span class="nb">StringReplace</span><span class="p">,</span> <span class="n">str</span><span class="p">,</span> <span class="n">str</span><span class="p">,</span> <span class="se">`%</span><span class="nv">%hex%</span><span class="p">,</span> <span class="o">%</span> <span class="nf">Chr</span><span class="p">(</span><span class="s">"0x"</span> <span class="o">.</span> <span class="n">hex</span><span class="p">),</span> <span class="n">All</span>
</span><span class=collapse> <span class="nb">Else</span> <span class="n">Break</span>
</span><span class=collapse> <span class="nb">Return</span><span class="p">,</span> <span class="n">str</span>
</span><span class=collapse><span class="p">}</span>
</span><span class=collapse>
</span><span class=collapse><span class="n">ShowHelp</span><span class="p">()</span> <span class="p">{</span>
</span><span class=collapse> <span class="n">help</span><span class="o">=</span>
</span><span class=collapse> <span class="g">(</span>
</span><span class=collapse><span class="g">## Pattern:</span>
</span><span class=collapse>
</span><span class=collapse><span class="g">The pattern to search for, which is a Perl-compatible regular expression (PCRE). The pattern's options (if any) must be included at the beginning of the string followed by a close-parenthesis. For example, the pattern "i)abc.*123" would turn on the case-insensitive option and search for "abc", followed by zero or more occurrences of any character, followed by "123". If there are no options, the ")" is optional; for example, ")abc" is equivalent to "abc".</span>
</span><span class=collapse>
</span><span class=collapse><span class="g">## Replacement:</span>
</span><span class=collapse>
</span><span class=collapse><span class="g">The string to be substituted for each match, which is plain text (not a regular expression). It may include backreferences like $1, which brings in the substring from Haystack that matched the first subpattern. The simplest backreferences are $0 through $9, where $0 is the substring that matched the entire pattern, $1 is the substring that matched the first subpattern, $2 is the second, and so on. For backreferences above 9 (and optionally those below 9), enclose the number in braces; e.g. ${10}, ${11}, and so on. For named subpatterns, enclose the name in braces; e.g. ${SubpatternName}. To specify a literal $, use $$ (this is the only character that needs such special treatment; backslashes are never needed to escape anything).</span>
</span><span class=collapse>
</span><span class=collapse><span class="g">To convert the case of a subpattern, follow the $ with one of the following characters: U or u (uppercase), L or l (lowercase), T or t (title case, in which the first letter of each word is capitalized but all others are made lowercase). For example, both $U1 and $U{1} transcribe an uppercase version of the first subpattern.</span>
</span><span class=collapse>
</span><span class=collapse><span class="g">Nonexistent backreferences and those that did not match anything in Haystack -- such as one of the subpatterns in "(abc)|(xyz)" -- are transcribed as empty strings.</span>
</span><span class=collapse><span class="g">)</span>
</span><span class=collapse> <span class="nb">MsgBox</span><span class="p">,</span> <span class="nv">%help%</span>
</span><span class=collapse><span class="p">}</span>
</span></code></pre></div>
<p>Put this script at a convenient location, probably right next to your master script, and add the
following hotkey to your master script:</p>
<div class="hl"><pre class=content><code><span><span class="nb">#IfWinActive</span> <span class="n">ahk_group</span> <span class="n">FileListers</span>
</span><span><span class="o">^+</span><span class="n">b</span><span class="o">::</span><span class="n">Run</span> <span class="n">batch</span><span class="o">-</span><span class="n">rename</span><span class="o">.</span><span class="n">ahk</span>
</span></code></pre></div>
<p>Here’s a little <strong>mute</strong> video recording of some usage examples of this tool:</p>
<video controls="" muted="" playsinline="" preload="" src="https://sharats.me/static/autohotkey-rename-by-regex.mp4">Your browser does not support HTML5 video. Here’s <a href="https://sharats.me/static/autohotkey-rename-by-regex.mp4">a link to the video</a>instead.</video>
<p class="note">If you’re using this, please keep caution. Please inspect the previous table before clicking on the
“Apply” button. If it ends up messing your files up, don’t hold me responsible. I’m sharing this
without warranty. As any source code block on this website, this is shared here with <a href="https://sharats.me/licenses/mit/">MIT
License</a>.</p>
<h3 id="copy-paths-of-selected-files">Copy Paths of Selected Files<a class="headerlink" href="#copy-paths-of-selected-files" title="Permanent link">¶</a></h3>
<p>This, again, is actually <em>partly</em> fulfilled by default Windows functionality. When we
<kbd>Shift+Right Click</kbd> on a file, we get the option to “Copy as path”, which works fine for
simple cases. But I wanted the following additional things for this feature:</p>
<ol>
<li>A keyboard hotkey, like <kbd>^+c</kbd>.</li>
<li>No surrounding double quotes.</li>
<li>Work with multiple files being selected. Copy each file’s path as one line.</li>
</ol>
<p>For this, I defined the following <kbd>^+c</kbd> hotkey on the file explorer windows. </p>
<div class="hl"><pre class=content><code><span><span class="nb">#IfWinActive</span> <span class="n">ahk_group</span> <span class="n">FileListers</span>
</span><span><span class="o">^+</span><span class="n">c</span><span class="o">::</span>
</span><span> <span class="nv">Clipboard</span> <span class="o">:=</span> <span class="n">JoinArrayContents</span><span class="p">(</span><span class="n">Explorer_GetSelected</span><span class="p">())</span>
</span><span> <span class="nb">Return</span>
</span></code></pre></div>
<p>This will get a list of all selected files in the current explorer window and join them into a
single string. The <code>Explorer_GetSelected</code> function comes from <a href="https://autohotkey.com/board/topic/60985-get-paths-of-selected-items-in-an-explorer-window/" rel="noopener noreferrer" target="_blank">this AutoHotkey forum
post</a> and the <code>JoinArrayContents</code> is given below:</p>
<div class="hl"><pre class=content><code><span><span class="n">JoinArrayContents</span><span class="p">(</span><span class="n">arr</span><span class="p">,</span> <span class="n">delimiter</span><span class="o">=</span><span class="s">"</span><span class="se">`n</span><span class="s">"</span><span class="p">)</span> <span class="p">{</span>
</span><span> <span class="n">content</span> <span class="o">:=</span> <span class="s">""</span>
</span><span> <span class="n">for</span> <span class="n">index</span><span class="p">,</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">arr</span> <span class="p">{</span>
</span><span> <span class="nb">if </span><span class="n">index</span> <span class="o">></span> <span class="mi">1</span>
</span><span> <span class="n">content</span> <span class="o">:=</span> <span class="n">content</span> <span class="o">.</span> <span class="n">delimiter</span>
</span><span> <span class="n">content</span> <span class="o">:=</span> <span class="n">content</span> <span class="o">.</span> <span class="n">item</span>
</span><span> <span class="p">}</span>
</span><span> <span class="nb">return</span> <span class="n">content</span>
</span><span><span class="p">}</span>
</span></code></pre></div>
<p>Now I can select one or more files, hit <kbd>^+c</kbd> and the full paths of <em>all</em> the selected
files will end up in my clipboard.</p>
<h3 id="copy-contents-of-selected-files">Copy Contents of Selected Files<a class="headerlink" href="#copy-contents-of-selected-files" title="Permanent link">¶</a></h3>
<p>This one, although sounds similar to the previous section, is quite different and useful in a very
different way. Where the previous section’s hotkey copies the selected files’ <em>paths</em>, this hotkey
is intended to copy the selected files’ <em>contents</em> as a whole.</p>
<p>I have a few (several?) small text files with snippets, template messages, etc. With this, I just
select one or multiple files and hit <kbd>Ctrl+Shift+x</kbd> and I’m ready to paste their contents.</p>
<div class="hl"><pre class=content><code><span><span class="nb">#IfWinActive</span> <span class="n">ahk_group</span> <span class="n">FileListers</span>
</span><span><span class="o">^+</span><span class="n">x</span><span class="o">::</span>
</span><span> <span class="n">CopySelectedFileContents</span><span class="p">()</span> <span class="p">{</span>
</span><span> <span class="n">files</span> <span class="o">:=</span> <span class="n">Explorer_GetSelected</span><span class="p">()</span>
</span><span> <span class="n">content</span> <span class="o">:=</span> <span class="s">""</span>
</span><span> <span class="n">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">file</span> <span class="ow">in</span> <span class="n">files</span> <span class="p">{</span>
</span><span> <span class="nb">FileRead</span><span class="p">,</span> <span class="n">text</span><span class="p">,</span> <span class="nv">%file%</span>
</span><span> <span class="nb">if </span><span class="n">i</span> <span class="o">></span> <span class="mi">1</span>
</span><span> <span class="n">content</span> <span class="o">:=</span> <span class="n">content</span> <span class="o">.</span> <span class="s">"</span><span class="se">`n`n</span><span class="s">"</span>
</span><span> <span class="n">content</span> <span class="o">:=</span> <span class="n">content</span> <span class="o">.</span> <span class="n">text</span>
</span><span> <span class="p">}</span>
</span><span> <span class="nv">Clipboard</span> <span class="o">:=</span> <span class="n">content</span>
</span><span> <span class="p">}</span>
</span></code></pre></div>
<p>This is the same <code>Explorer_GetSelected</code> I referred to in the previous section. However, in the above
hotkey definition, instead of setting the paths to <code>Clipboard</code>, we set the contents of the files.</p>
<p>Just like the previous hotkey, I can select multiple <em>text</em> files and hit <kbd>^+x</kbd> and the
contents of all selected files will end up in my clipboard, separated by two blank lines.</p>
<p>This doesn’t work with images yet though. Still have to figure that one out.</p>
<h3 id="create-file-with-clipboard-contents">Create File with Clipboard Contents<a class="headerlink" href="#create-file-with-clipboard-contents" title="Permanent link">¶</a></h3>
<p>This is the opposite of the previous hotkey. Here, I want whatever is in the Clipboard to be saved
to a text file in the current folder.</p>
<div class="hl"><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span><span>9
</span><span>10
</span><span>11
</span><span>12
</span><span>13
</span><span>14
</span><span>15
</span><span>16
</span><span>17
</span><span>18
</span><span>19
</span><span>20
</span><span>21
</span><span>22
</span></pre><pre class=content><code><span><span class="nb">#IfWinActive</span> <span class="n">ahk_group</span> <span class="n">FileListers</span>
</span><span><span class="o">^+</span><span class="n">v</span><span class="o">::</span>
</span><span> <span class="n">CreateFileWithClipboardContents</span><span class="p">()</span> <span class="p">{</span>
</span><span> <span class="n">loc</span> <span class="o">:=</span> <span class="n">Explorer_GetPath</span><span class="p">()</span>
</span><span> <span class="nb">WinGetPos</span><span class="p">,</span> <span class="n">wx</span><span class="p">,</span> <span class="n">wy</span>
</span><span> <span class="nb">ControlGetPos</span><span class="p">,</span> <span class="n">cx</span><span class="p">,</span> <span class="n">cy</span><span class="p">,</span> <span class="n">cw</span><span class="p">,</span> <span class="p">,</span> <span class="n">DirectUIHWND3</span>
</span><span> <span class="n">x</span> <span class="o">:=</span> <span class="n">wx</span> <span class="o">+</span> <span class="n">cx</span> <span class="o">+</span> <span class="n">cw</span><span class="o">/</span><span class="mi">2</span> <span class="o">-</span> <span class="mi">200</span>
</span><span> <span class="n">y</span> <span class="o">:=</span> <span class="n">wy</span> <span class="o">+</span> <span class="n">cy</span>
</span><span> <span class="nb">InputBox</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="nv">Clipboard</span> <span class="n">File</span>
</span><span> <span class="p">,</span> <span class="n">Enter</span> <span class="n">file</span> <span class="n">name</span> <span class="n">to</span> <span class="n">paste</span> <span class="nv">clipboard</span> <span class="n">contents</span> <span class="ow">in</span><span class="o">:</span><span class="p">,</span> <span class="p">,</span> <span class="mi">400</span><span class="p">,</span> <span class="mi">120</span><span class="p">,</span> <span class="nv">%x%</span><span class="p">,</span> <span class="nv">%y%</span><span class="p">,</span> <span class="p">,</span>
</span><span> <span class="p">,</span> <span class="n">clip</span><span class="o">.</span><span class="n">txt</span>
</span><span> <span class="nb">if </span><span class="nv">ErrorLevel</span>
</span><span> <span class="nb">Return</span>
</span><span> <span class="n">filepath</span> <span class="o">:=</span> <span class="n">loc</span> <span class="o">.</span> <span class="s">"\"</span> <span class="o">.</span> <span class="n">filename</span>
</span><span> <span class="n">if</span> <span class="p">(</span><span class="nf">FileExist</span><span class="p">(</span><span class="n">filepath</span><span class="p">))</span> <span class="p">{</span>
</span><span> <span class="nb">MsgBox</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">Overwrite</span><span class="p">,</span> <span class="n">Overwriting</span> <span class="n">existing</span> '<span class="nv">%filename%</span>'<span class="o">!</span>
</span><span> <span class="nb">IfMsgBox</span> <span class="n">Cancel</span>
</span><span> <span class="nb">Return</span>
</span><span> <span class="nb">FileDelete</span><span class="p">,</span> <span class="nv">%filepath%</span>
</span><span> <span class="p">}</span>
</span><span> <span class="nb">Fileappend</span><span class="p">,</span> <span class="nv">%Clipboard%</span><span class="p">,</span> <span class="nv">%filepath%</span>
</span><span> <span class="p">}</span>
</span></code></pre></div>
<p>The <code>Explorer_GetPath</code> function used in the above snippet is also from the same source I mentioned
in the previous sections. The way this works is when the hotkey is triggered, we are prompted to
enter the name of the file to which the clipboard’s contents will be saved. Once we provide a file
name and submit, the file is created.</p>
<p>With this, I can copy some text out of a webpage or an email in Outlook and saving it to a text file
is a quick <kbd>^+v</kbd>. Once I created this hotkey, it became my primary way of creating new text
files. I no longer open Notepad, write (or paste) and then save the file to the desired directory.
Instead, I open the folder, use this hotkey to create the file, and then open the file in Notepad.
Somehow, it feels more natural.</p>
<p>This doesn’t work with images either. Have to figure this one out too.</p>
<h3 id="create-folder-hierarchy-and-enter-it">Create Folder Hierarchy and Enter it<a class="headerlink" href="#create-folder-hierarchy-and-enter-it" title="Permanent link">¶</a></h3>
<p>The file explorer has a default hotkey for creating new folders (<kbd>Ctrl+Shift+n</kbd>), but it
doesn’t let us create a tree or folders at one go. To do that, we have to create a directory, enter
it, create again etc. This quickly gets tedious if it has to be done often.</p>
<p>As always I tried to address it with AutoHotkey.</p>
<div class="hl"><input type=checkbox id=co-2><label for=co-2><span class='btn show-full-code-btn'>Show remaining 10 lines</span></label><pre class=content><code><span><span class="nb">#IfWinActive</span> <span class="n">ahk_group</span> <span class="n">FileListers</span>
</span><span><span class="o">^</span><span class="n">n</span><span class="o">::</span>
</span><span><span class="n">CreateFolderHierarchy</span><span class="p">()</span> <span class="p">{</span>
</span><span> <span class="n">loc</span> <span class="o">:=</span> <span class="n">Explorer_GetPath</span><span class="p">()</span>
</span><span> <span class="nb">WinGetPos</span><span class="p">,</span> <span class="n">wx</span><span class="p">,</span> <span class="n">wy</span>
</span><span> <span class="nb">ControlGetPos</span><span class="p">,</span> <span class="n">cx</span><span class="p">,</span> <span class="n">cy</span><span class="p">,</span> <span class="n">cw</span><span class="p">,</span> <span class="p">,</span> <span class="n">DirectUIHWND3</span>
</span><span> <span class="n">x</span> <span class="o">:=</span> <span class="n">wx</span> <span class="o">+</span> <span class="n">cx</span> <span class="o">+</span> <span class="n">cw</span><span class="o">/</span><span class="mi">2</span> <span class="o">-</span> <span class="mi">200</span>
</span><span> <span class="n">y</span> <span class="o">:=</span> <span class="n">wy</span> <span class="o">+</span> <span class="n">cy</span>
</span><span> <span class="nb">InputBox</span><span class="p">,</span> <span class="n">folder</span><span class="p">,</span> <span class="n">Create</span> <span class="n">Folder</span><span class="p">,</span> <span class="n">Enter</span> <span class="n">folder</span> <span class="n">name</span><span class="o">/</span><span class="n">path</span> <span class="n">create</span><span class="o">:</span><span class="p">,</span> <span class="p">,</span> <span class="mi">400</span><span class="p">,</span> <span class="mi">120</span>
</span><span> <span class="p">,</span> <span class="nv">%x%</span><span class="p">,</span> <span class="nv">%y%</span>
</span><span> <span class="nb">if </span><span class="nv">ErrorLevel</span>
</span><span> <span class="nb">Return</span>
</span><span> <span class="n">folder</span> <span class="o">:=</span> <span class="n">StrReplace</span><span class="p">(</span><span class="n">folder</span><span class="p">,</span> <span class="s">"/"</span><span class="p">,</span> <span class="s">"\"</span><span class="p">)</span>
</span><span> <span class="n">pos</span> <span class="o">:=</span> <span class="nf">RegExMatch</span><span class="p">(</span><span class="n">folder</span><span class="p">,</span> <span class="s">"O)\{([^\{]+)\}"</span><span class="p">,</span> <span class="n">match</span><span class="p">)</span>
</span><span> <span class="n">folders</span> <span class="o">:=</span> <span class="p">[]</span>
</span><span> <span class="n">if</span> <span class="p">(</span><span class="n">pos</span> <span class="o">></span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
</span><span> <span class="n">parts</span> <span class="o">:=</span> <span class="n">StrSplit</span><span class="p">(</span><span class="n">match</span><span class="o">.</span><span class="n">value</span><span class="p">(</span><span class="mi">1</span><span class="p">),</span> <span class="s">","</span><span class="p">)</span>
</span><span> <span class="n">prefix</span> <span class="o">:=</span> <span class="nf">SubStr</span><span class="p">(</span><span class="n">folder</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">match</span><span class="o">.</span><span class="n">Pos</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span>
</span><span> <span class="n">suffix</span> <span class="o">:=</span> <span class="nf">SubStr</span><span class="p">(</span><span class="n">folder</span><span class="p">,</span> <span class="n">match</span><span class="o">.</span><span class="n">Pos</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="o">+</span> <span class="n">match</span><span class="o">.</span><span class="n">Len</span><span class="p">(</span><span class="mi">0</span><span class="p">))</span>
</span><span> <span class="n">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">part</span> <span class="ow">in</span> <span class="n">parts</span> <span class="p">{</span>
</span><span class=collapse> <span class="n">folders</span><span class="o">.</span><span class="n">Push</span><span class="p">(</span><span class="n">prefix</span> <span class="o">.</span> <span class="n">part</span> <span class="o">.</span> <span class="n">suffix</span><span class="p">)</span>
</span><span class=collapse> <span class="p">}</span>
</span><span class=collapse> <span class="p">}</span> <span class="n">else</span> <span class="p">{</span>
</span><span class=collapse> <span class="n">folders</span><span class="o">.</span><span class="n">Push</span><span class="p">(</span><span class="n">folder</span><span class="p">)</span>
</span><span class=collapse> <span class="p">}</span>
</span><span class=collapse> <span class="n">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">folder</span> <span class="ow">in</span> <span class="n">folders</span> <span class="p">{</span>
</span><span class=collapse> <span class="nb">FileCreateDir</span><span class="p">,</span> <span class="nv">%loc%</span>\<span class="nv">%folder%</span>
</span><span class=collapse> <span class="p">}</span>
</span><span class=collapse> <span class="n">Explorer_GetWindow</span><span class="p">()</span><span class="o">.</span><span class="n">Navigate2</span><span class="p">(</span><span class="n">loc</span> <span class="o">.</span> <span class="s">"\"</span> <span class="o">.</span> <span class="n">folders</span><span class="p">[</span><span class="n">folders</span><span class="o">.</span><span class="n">Length</span><span class="p">()])</span>
</span><span class=collapse><span class="p">}</span>
</span></code></pre></div>
<p>This uses the same explorer library I mentioned in the previous sections. When this hotkey is
triggered, we get a prompt where we can enter a folder tree (<em>i.e.,</em> folders separated by <code>/</code> or
<code>\</code>) and they will all be created. As a bonus, we are also switched to that newly created folder,
so we can start working with it right away.</p>
<p>Now I can hit <kbd>^n</kbd> and type in <code>src/main/java</code> or <code>2020-01/pics</code>, and all nesting structure
is created and navigated, which is usually followed by pasting some files.</p>
<h2 id="email-selected-files-with-outlook">Email Selected File(s) with Outlook<a class="headerlink" href="#email-selected-files-with-outlook" title="Permanent link">¶</a></h2>
<p>Outlook is necessary tool for email at most corporate workplaces. So it’s important to look at how
we use it, and what parts of it we can automate / improve.</p>
<p>It’s also quite common to have to send files over email as attachments. Yet, considering how often
we tend to do that, it’s still a tedious process. Go to outlook, start new mail, drag-drop the file
in this window, fill up the mail, send. It gets a bit better if you copy the file to clipboard and
then instead of starting a new mail with <kbd>Ctrl+n</kbd>, you could just hit <kbd>Ctrl+v</kbd> in
the Outlook Mails view and new mail will open up with file in clipboard as attachment. But I’d say
it’s still not good enough.</p>
<p>The solution I currently use is the <kbd>Ctrl+m</kbd> hotkey for file explorers. The workflow is
that I select some files in my file explorer, hit <kbd>Ctrl+m</kbd> and a new mail window opens up
with the selected files as attachments, the message body containing the list of files for me to
edit and subject containing the list of files.</p>
<div class="hl"><pre class=content><code><span><span class="nb">#IfWinActive</span> <span class="n">ahk_group</span> <span class="n">FileListers</span>
</span><span><span class="o">^</span><span class="n">m</span><span class="o">::</span><span class="n">OutlookNewMail</span><span class="p">(</span><span class="n">Explorer_GetSelected</span><span class="p">())</span>
</span></code></pre></div>
<p>The <code>Explorer_GetSelected</code> function is from the same library I mentioned in an earlier section. The
following is the definition of the <code>OutlookNewMail</code> function:</p>
<div class="hl"><input type=checkbox id=co-3><label for=co-3><span class='btn show-full-code-btn'>Show remaining 9 lines</span></label><pre class=content><code><span><span class="n">OutlookNewMail</span><span class="p">(</span><span class="n">attachments</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
</span><span> <span class="n">outlook</span> <span class="o">:=</span> <span class="nf">ComObjActive</span><span class="p">(</span><span class="s">"Outlook.Application"</span><span class="p">)</span>
</span><span> <span class="n">mail</span> <span class="o">:=</span> <span class="n">outlook</span><span class="o">.</span><span class="n">CreateItem</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
</span><span>
</span><span> <span class="n">if</span> <span class="p">(</span><span class="n">attachments</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
</span><span> <span class="n">msg</span> <span class="o">:=</span> <span class="s">""</span>
</span><span> <span class="n">sub</span> <span class="o">:=</span> <span class="s">"Files: "</span>
</span><span> <span class="n">for</span> <span class="n">index</span><span class="p">,</span> <span class="n">file</span> <span class="ow">in</span> <span class="n">attachments</span> <span class="p">{</span>
</span><span> <span class="n">mail</span><span class="o">.</span><span class="n">Attachments</span><span class="o">.</span><span class="n">Add</span><span class="p">(</span><span class="n">file</span><span class="p">)</span>
</span><span> <span class="nb">SplitPath</span><span class="p">,</span> <span class="n">file</span><span class="p">,</span> <span class="n">basename</span>
</span><span> <span class="n">msg</span> <span class="o">:=</span> <span class="n">msg</span> <span class="o">.</span> <span class="s">"<p class=MsoNormal>&nbsp;&nbsp;&nbsp; "</span>
</span><span> <span class="o">.</span> <span class="n">basename</span> <span class="o">.</span> <span class="s">"<o:p></o:p></p>"</span>
</span><span> <span class="n">if</span> <span class="p">(</span><span class="n">attachments</span><span class="o">.</span><span class="n">Length</span><span class="p">()</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span>
</span><span> <span class="n">sub</span> <span class="o">:=</span> <span class="s">"File: "</span> <span class="o">.</span> <span class="n">basename</span>
</span><span> <span class="nb">else</span> <span class="n">if</span> <span class="p">(</span><span class="n">index</span> <span class="o">==</span> <span class="n">attachments</span><span class="o">.</span><span class="n">_MaxIndex</span><span class="p">())</span>
</span><span> <span class="n">sub</span> <span class="o">:=</span> <span class="n">sub</span> <span class="o">.</span> <span class="s">" & "</span> <span class="o">.</span> <span class="n">basename</span>
</span><span> <span class="nb">else</span> <span class="n">if</span> <span class="p">(</span><span class="n">index</span> <span class="o">==</span> <span class="n">attachments</span><span class="o">.</span><span class="n">_MinIndex</span><span class="p">())</span>
</span><span> <span class="n">sub</span> <span class="o">:=</span> <span class="n">sub</span> <span class="o">.</span> <span class="n">basename</span>
</span><span> <span class="nb">else</span>
</span><span> <span class="n">sub</span> <span class="o">:=</span> <span class="n">sub</span> <span class="o">.</span> <span class="s">", "</span> <span class="o">.</span> <span class="n">basename</span>
</span><span class=collapse> <span class="p">}</span>
</span><span class=collapse>
</span><span class=collapse> <span class="nb">FileRead</span><span class="p">,</span> <span class="n">emailTpl</span><span class="p">,</span> <span class="n">email</span><span class="o">.</span><span class="n">tpl</span><span class="o">.</span><span class="n">txt</span>
</span><span class=collapse> <span class="n">mail</span><span class="o">.</span><span class="n">HTMLBody</span> <span class="o">:=</span> <span class="n">StrReplace</span><span class="p">(</span><span class="n">emailTpl</span><span class="p">,</span> <span class="s">"$$MESSAGE$$"</span><span class="p">,</span> <span class="n">msg</span> <span class="o">.</span> <span class="s">"</ul>"</span><span class="p">)</span>
</span><span class=collapse> <span class="n">mail</span><span class="o">.</span><span class="n">Subject</span> <span class="o">:=</span> <span class="n">sub</span>
</span><span class=collapse> <span class="p">}</span>
</span><span class=collapse>
</span><span class=collapse> <span class="n">mail</span><span class="o">.</span><span class="n">Display</span>
</span><span class=collapse><span class="p">}</span>
</span></code></pre></div>
<p>AutoHotkey supports connecting to OLE objects, which means we can create hotkey that create rich
interactions with Office applications like Outlook. We leverage this in the above function.</p>
<p>All I have to do now, is fill up the “To:” field and hit <kbd>Ctrl+Enter</kbd>. I’ve been loving
this ever since.</p>
<p class="note">Note, of course, that since this connects to the Outlook OLE object, Outlook needs to be running for
this work.</p>
<h3 id="global-hotkey-for-new-mail">Global Hotkey for New Mail<a class="headerlink" href="#global-hotkey-for-new-mail" title="Permanent link">¶</a></h3>
<p>If you’ve noticed, the above function’s <code>attachments</code> argument has a default value. If this argument
is not provided, we just get a blank email window open up. This is convenient on its own. So I have
it as a <em>global</em> hotkey:</p>
<div class="hl"><pre class=content><code><span><span class="nl">#c::</span><span class="n">OutlookNewMail</span><span class="p">()</span>
</span></code></pre></div>
<p>This works really well since the new mail window opens up with my signature already filled up and
the focus is set to the “To:” field perfectly to quickly start working on my email.</p>
<h2 id="conclusion">Conclusion<a class="headerlink" href="#conclusion" title="Permanent link">¶</a></h2>
<p>AutoHotkey is a powerful tool for automating all sorts of workflows on Windows. If you can get past
the quirks in the language itself, the underlying engine is very powerful. I know that over the few
years I’ve used it, I’ve only made use of a small portion of its potential. In addition, the help
file that is shipped with AutoHotkey (right-click on the tray icon and click on “Help”) is very
good. It’s exhaustive, very detailed and contains lots of examples. I encourage going over it
occasionally to find interesting things to add to your workflow. Good luck!</p>Automating the Vim workplace — Chapter Ⅲ2020-03-15T00:00:00+05:302020-03-15T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2020-03-15:/posts/automating-the-vim-workplace-3/<p>This is the third installment of my <a href="../automating-the-vim-workplace/">Automate the Vim workplace</a> article series. As
always, feel free to grab the ideas in this article or, better yet, take inspiration and inspect
your workflow to identify such opportunities.</p>
<p>This article is part of a series:</p>
<ol>
<li><a href="../automating-the-vim-workplace/">Chapter Ⅰ</a>.</li>
<li><a href="../automating-the-vim-workplace-2/">Chapter Ⅱ</a>.</li>
<li>Chapter Ⅲ …</li></ol><p>This is the third installment of my <a href="../automating-the-vim-workplace/">Automate the Vim workplace</a> article series. As
always, feel free to grab the ideas in this article or, better yet, take inspiration and inspect
your workflow to identify such opportunities.</p>
<p>This article is part of a series:</p>
<ol>
<li><a href="../automating-the-vim-workplace/">Chapter Ⅰ</a>.</li>
<li><a href="../automating-the-vim-workplace-2/">Chapter Ⅱ</a>.</li>
<li>Chapter Ⅲ (this article).</li>
</ol>
<div class="toc"><span class="toctitle">Table of Contents</span><ul>
<li><a href="#copy-file-full-path">Copy file full path</a></li>
<li><a href="#squeeze-expand-contiguous-blank-lines">Squeeze / Expand contiguous blank lines</a></li>
<li><a href="#duplicate-text-in-motion">Duplicate Text in Motion</a></li>
<li><a href="#transpose">Transpose</a></li>
<li><a href="#using-vartabstop-to-line-up">Using vartabstop to Line Up</a></li>
<li><a href="#strip-trailing-spaces">Strip Trailing Spaces</a></li>
<li><a href="#append-character-over-motion">Append character over motion</a></li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
</div>
<p class="note">Please note that all that I share below is what I’m using with Vim (more specifically, GVim on
Windows). I don’t use Neovim (yet) and I can’t speak for any of the below for Neovim.</p>
<h2 id="copy-file-full-path">Copy file full path<a class="headerlink" href="#copy-file-full-path" title="Permanent link">¶</a></h2>
<p>I work with CSV files quite a bit. I spend a lot of time grooming them, fixing them etc. in Vim and
then once they’re ready, I need to upload it to an internal tool. For that, the following command
has proven to be super useful.</p>
<div class="hl"><pre class=content><code><span><span class="c">" Command to copy the current file's full absolute path.</span>
</span><span>command CopyFilePath <span class="k">let</span> @<span class="p">+</span> <span class="p">=</span> expand<span class="p">(</span>has<span class="p">(</span><span class="s1">'win32'</span><span class="p">)</span> ? <span class="s1">'%:p:gs?/?\\?'</span> : <span class="s1">'%:p'</span><span class="p">)</span>
</span></code></pre></div>
<p>This is one of those commands that feel super-simple and super-obvious once we add it to our
workflow. Running this command places the full path of the current buffer’s file into the system
clipboard. Then, I just go to my browser, click on the upload button and paste the file location.
This is much quicker than having to navigate to the folder and selecting the file. It also helps
avoid selecting the wrong file (which happened more than once to me).</p>
<h2 id="squeeze-expand-contiguous-blank-lines">Squeeze / Expand contiguous blank lines<a class="headerlink" href="#squeeze-expand-contiguous-blank-lines" title="Permanent link">¶</a></h2>
<p>When building or editing large CSV files, I often end up with several (read: hundreds) of blank
lines. This is usually because I select those lines in visual block mode, cut them, and then paste
as a new column to some existing rows. Solving that problem is for another day I suppose.</p>
<p>Nonetheless, I needed a quick way to condense several blank lines into a single blank line. The
following is the result of that:</p>
<div class="hl"><pre class=content><code><span><span class="nb">nnoremap</span> <span class="p"><</span><span class="k">silent</span><span class="p">></span> dc :<span class="p"><</span>C<span class="p">-</span><span class="k">u</span><span class="p">></span><span class="k">call</span> <span class="p"><</span>SID<span class="p">></span>CleanupBlanks<span class="p">()<</span>CR<span class="p">></span>
</span><span><span class="k">fun</span> s:CleanupBlanks<span class="p">()</span> abort
</span><span> <span class="k">if</span> <span class="p">!</span>empty<span class="p">(</span>getline<span class="p">(</span><span class="s1">'.'</span><span class="p">))</span>
</span><span> <span class="k">return</span>
</span><span> <span class="k">endif</span>
</span><span> <span class="k">let</span> <span class="k">l</span>:curr <span class="p">=</span> line<span class="p">(</span><span class="s1">'.'</span><span class="p">)</span>
</span><span>
</span><span> <span class="k">let</span> <span class="k">l</span>:<span class="k">start</span> <span class="p">=</span> <span class="k">l</span>:curr
</span><span> <span class="k">while</span> <span class="k">l</span>:<span class="k">start</span> <span class="p">></span> <span class="m">1</span> && empty<span class="p">(</span>getline<span class="p">(</span><span class="k">l</span>:<span class="k">start</span> <span class="p">-</span> <span class="m">1</span><span class="p">))</span>
</span><span> <span class="k">let</span> <span class="k">l</span>:<span class="k">start</span> <span class="p">-=</span> <span class="m">1</span>
</span><span> <span class="k">endwhile</span>
</span><span>
</span><span> <span class="k">let</span> <span class="k">l</span>:<span class="k">end</span> <span class="p">=</span> <span class="k">l</span>:curr
</span><span> <span class="k">let</span> <span class="k">l</span>:last_line_num <span class="p">=</span> line<span class="p">(</span><span class="s1">'$'</span><span class="p">)</span>
</span><span> <span class="k">while</span> <span class="k">l</span>:<span class="k">end</span> <span class="p"><</span> <span class="k">l</span>:last_line_num && empty<span class="p">(</span>getline<span class="p">(</span><span class="k">l</span>:<span class="k">end</span> <span class="p">+</span> <span class="m">1</span><span class="p">))</span>
</span><span> <span class="k">let</span> <span class="k">l</span>:<span class="k">end</span> <span class="p">+=</span> <span class="m">1</span>
</span><span> <span class="k">endwhile</span>
</span><span>
</span><span> <span class="k">if</span> <span class="k">l</span>:<span class="k">end</span> <span class="p">>=</span> <span class="k">l</span>:<span class="k">start</span> <span class="p">+</span> <span class="k">v</span>:count1
</span><span> exe <span class="k">l</span>:<span class="k">start</span> . <span class="s1">'+'</span> . <span class="k">v</span>:count1 . <span class="s1">','</span> . <span class="k">l</span>:<span class="k">end</span> . <span class="s1">'d_'</span>
</span><span> <span class="k">else</span>
</span><span> <span class="k">call</span> append<span class="p">(</span><span class="k">l</span>:<span class="k">end</span><span class="p">,</span> repeat<span class="p">(</span>[<span class="s1">''</span>]<span class="p">,</span> <span class="k">v</span>:count1 <span class="p">-</span> <span class="p">(</span><span class="k">l</span>:<span class="k">end</span> <span class="p">-</span> <span class="k">l</span>:<span class="k">start</span><span class="p">)</span> <span class="p">-</span> <span class="m">1</span><span class="p">))</span>
</span><span> <span class="k">endif</span>
</span><span> <span class="k">call</span> cursor<span class="p">(</span><span class="k">l</span>:<span class="k">start</span><span class="p">,</span> <span class="m">1</span><span class="p">)</span>
</span><span><span class="k">endfun</span>
</span></code></pre></div>
<p>This defines the <kbd>dc</kbd> mapping, which will condense multiple blank lines under the cursor
into a single one.</p>
<p>Then, on a weekend when I was feeling particularly silly, I extended this to accept a number in
front of <kbd>dc</kbd> which specifies the number of newlines to end up with. So now, this mapping
can both condense, and expand vertical blank space to any size I want! Yay, silly weekends!</p>
<h2 id="duplicate-text-in-motion">Duplicate Text in Motion<a class="headerlink" href="#duplicate-text-in-motion" title="Permanent link">¶</a></h2>
<p>Copy-pasta is a legitimate writing and coding technique. But I do it so mindlessly and often, I
started to think of <em>duplicating</em> as a distinct operation, and not as a combination of <em>yanking</em> and
then <em>pasting</em>. But if that is so, <em>duplicating</em> some text should not mess with my registers. This
was messing with the nice semantic pool my thoughts were swimming in (!).</p>
<p>So I built a mapping that would let me duplicate the text over any motion (like text objects),
without touching the registers. Following is how it’s built:</p>
<div class="hl"><input type=checkbox id=co-6><label for=co-6><span class='btn show-full-code-btn'>Show remaining 6 lines</span></label><pre class=content><code><span><span class="c">" Duplicate text, selected or over motion.</span>
</span><span><span class="nb">nnoremap</span> <span class="p"><</span><span class="k">silent</span><span class="p">></span> <span class="p"><</span>Leader<span class="p">></span>uu :<span class="k">t</span>.\<span class="p">|</span><span class="k">silent</span><span class="p">!</span> <span class="k">call</span> repeat#<span class="k">set</span><span class="p">(</span><span class="s1">'duu'</span><span class="p">,</span> <span class="k">v</span>:count<span class="p">)<</span>CR<span class="p">></span>
</span><span><span class="nb">nnoremap</span> <span class="p"><</span><span class="k">silent</span><span class="p">></span> <span class="p"><</span>Leader<span class="p">></span><span class="k">u</span> :<span class="k">set</span> <span class="nb">opfunc</span><span class="p">=</span>DuplicateText<span class="p"><</span>CR<span class="p">></span><span class="k">g</span>@
</span><span><span class="nb">vnoremap</span> <span class="p"><</span><span class="k">silent</span><span class="p">></span> <span class="p"><</span>Leader<span class="p">></span><span class="k">u</span> :<span class="p"><</span>C<span class="p">-</span><span class="k">u</span><span class="p">></span><span class="k">call</span> DuplicateText<span class="p">(</span><span class="s1">'vis'</span><span class="p">)<</span>CR<span class="p">></span>
</span><span><span class="k">fun</span> DuplicateText<span class="p">(</span>type<span class="p">)</span> abort
</span><span> <span class="k">let</span> <span class="k">marks</span> <span class="p">=</span> <span class="k">a</span>:type <span class="p">==</span>? <span class="s1">'vis'</span> ? <span class="s1">'<>'</span> : <span class="s1">'[]'</span>
</span><span> <span class="k">let</span> [_<span class="p">,</span> l1<span class="p">,</span> c1<span class="p">,</span> _] <span class="p">=</span> getpos<span class="p">(</span><span class="s2">"'"</span> . <span class="k">marks</span>[<span class="m">0</span>]<span class="p">)</span>
</span><span> <span class="k">let</span> [_<span class="p">,</span> l2<span class="p">,</span> c2<span class="p">,</span> _] <span class="p">=</span> getpos<span class="p">(</span><span class="s2">"'"</span> . <span class="k">marks</span>[<span class="m">1</span>]<span class="p">)</span>
</span><span>
</span><span> <span class="k">if</span> l1 <span class="p">==</span> l2
</span><span> <span class="k">let</span> text <span class="p">=</span> getline<span class="p">(</span>l1<span class="p">)</span>
</span><span> <span class="k">call</span> setline<span class="p">(</span>l1<span class="p">,</span> text[:c2 <span class="p">-</span> <span class="m">1</span>] . text[c1 <span class="p">-</span> <span class="m">1</span>:c2] . text[c2 <span class="p">+</span> <span class="m">1</span>:]<span class="p">)</span>
</span><span> <span class="k">call</span> cursor<span class="p">(</span>l2<span class="p">,</span> c2 <span class="p">+</span> <span class="m">1</span><span class="p">)</span>
</span><span> <span class="k">if</span> <span class="k">a</span>:type <span class="p">==</span>? <span class="s1">'vis'</span>
</span><span> exe <span class="s1">'normal! v'</span> . <span class="p">(</span>c2 <span class="p">-</span> c1<span class="p">)</span> . <span class="s1">'l'</span>
</span><span> <span class="k">endif</span>
</span><span>
</span><span> <span class="k">else</span>
</span><span> <span class="k">call</span> append<span class="p">(</span>l2<span class="p">,</span> getline<span class="p">(</span>l1<span class="p">,</span> l2<span class="p">))</span>
</span><span> <span class="k">call</span> cursor<span class="p">(</span>l2 <span class="p">+</span> <span class="m">1</span><span class="p">,</span> c1<span class="p">)</span>
</span><span class=collapse> <span class="k">if</span> <span class="k">a</span>:type <span class="p">==</span>? <span class="s1">'vis'</span>
</span><span class=collapse> exe <span class="s1">'normal! V'</span> . <span class="p">(</span>l2 <span class="p">-</span> l1<span class="p">)</span> . <span class="s1">'j'</span>
</span><span class=collapse> <span class="k">endif</span>
</span><span class=collapse>
</span><span class=collapse> <span class="k">endif</span>
</span><span class=collapse><span class="k">endfun</span>
</span></code></pre></div>
<p>Now, what used to be <kbd>yap}p</kbd> has become <kbd>,uap</kbd>. That’s just one key reduced but a
reduction in keys is not what I’m aiming at here. It’s cognitive load of “duplicate this text” over
“copy this text, go to end of text, paste text”. This works in visual mode as well, though I don’t
use it as often.</p>
<p>Additionally, if triggered in visual mode, the duplicated text is selected again in visual mode.
This quickly highlights the newly inserted text, so I can continue with operating on the
duplicated text.</p>
<p>Now, if you’re aware of the <code>:t</code> (or <code>:copy</code>) command, then what I’m doing above may seem
pointlessly elaborate. To an extent, I agree. In fact, I’m using the <code>:t</code> command for the
<kbd>,uu</kbd> mapping which is for duplicating a single line. The difference is that where <code>:t</code>
only works line-wise, my implementation above can work character wise as well as line wise. For
example, <kbd>,uaw</kbd> (or just <kbd>,uw</kbd>) will duplicate a single word, just like
<kbd>,uap</kbd> will duplicate a paragraph.</p>
<h2 id="transpose">Transpose<a class="headerlink" href="#transpose" title="Permanent link">¶</a></h2>
<p>This is another mapping I created to help me with CSV files. Specifically, this one works with
tab-separated files, which are even more awesome to edit in Vim, thanks to the <a href="https://vimhelp.org/options.txt.html#%27vartabstop%27" rel="noopener noreferrer" target="_blank">vartabstop</a>
option. The next section describes how I use this when editing tab separated files.</p>
<p>This mapping, when applied over lines with tab separated values, will transpose the matrix made of
lines and tabs. Check out the GIF below to get a better understanding of how this works.</p>
<div class="hl"><pre class=content><code><span><span class="c">" Transpose tab separated values in selection or over motion.</span>
</span><span><span class="nb">nnoremap</span> <span class="p"><</span><span class="k">silent</span><span class="p">></span> gt :<span class="k">set</span> <span class="nb">opfunc</span><span class="p">=</span>Transpose<span class="p"><</span>CR<span class="p">></span><span class="k">g</span>@
</span><span><span class="nb">vnoremap</span> <span class="p"><</span><span class="k">silent</span><span class="p">></span> gt :<span class="p"><</span>C<span class="p">-</span><span class="k">u</span><span class="p">></span><span class="k">call</span> Transpose<span class="p">(</span><span class="m">1</span><span class="p">)<</span>CR<span class="p">></span>
</span><span><span class="k">fun</span> Transpose<span class="p">(</span>...<span class="p">)</span> abort
</span><span> <span class="k">let</span> vis <span class="p">=</span> get<span class="p">(</span><span class="k">a</span>:<span class="m">000</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">)</span>
</span><span> <span class="k">let</span> <span class="k">marks</span> <span class="p">=</span> vis ? <span class="s1">'<>'</span> : <span class="s1">'[]'</span>
</span><span> <span class="k">let</span> [_<span class="p">,</span> l1<span class="p">,</span> c1<span class="p">,</span> _] <span class="p">=</span> getpos<span class="p">(</span><span class="s2">"'"</span> . <span class="k">marks</span>[<span class="m">0</span>]<span class="p">)</span>
</span><span> <span class="k">let</span> [_<span class="p">,</span> l2<span class="p">,</span> c2<span class="p">,</span> _] <span class="p">=</span> getpos<span class="p">(</span><span class="s2">"'"</span> . <span class="k">marks</span>[<span class="m">1</span>]<span class="p">)</span>
</span><span> <span class="k">let</span> <span class="k">l</span>:<span class="nb">lines</span> <span class="p">=</span> map<span class="p">(</span>getline<span class="p">(</span>l1<span class="p">,</span> l2<span class="p">),</span> <span class="s1">'split(v:val, "\t")'</span><span class="p">)</span>
</span><span> <span class="k">py3</span> <span class="p"><<</span>EOPYTHON
</span><span>import <span class="k">vim</span>
</span><span>from itertools import zip_longest
</span><span>out <span class="p">=</span> <span class="nb">list</span><span class="p">(</span>zip_longest<span class="p">(</span>*<span class="k">vim</span>.eval<span class="p">(</span><span class="s1">'l:lines'</span><span class="p">),</span> fillvalue<span class="p">=</span><span class="s1">''</span><span class="p">))</span>
</span><span>EOPYTHON
</span><span> <span class="k">let</span> out <span class="p">=</span> map<span class="p">(</span>py3eval<span class="p">(</span><span class="s1">'out'</span><span class="p">),</span> <span class="s1">'join(v:val, "\t")'</span><span class="p">)</span>
</span><span> <span class="k">call</span> append<span class="p">(</span>l2<span class="p">,</span> out<span class="p">)</span>
</span><span> exe l1 . <span class="s1">','</span> . l2 . <span class="s1">'delete _'</span>
</span><span><span class="k">endfun</span>
</span></code></pre></div>
<p class="note">Needs <code>+python3</code>.</p>
<p class="img"><a href="https://sharats.me/static/vim-transpose.gif"><img alt="Demo of transpose mapping" src="https://sharats.me/static/vim-transpose.gif"></a></p>
<p>The keys I’m hitting in the GIF is <kbd>gtip</kbd>. I’m transposing the lines in the inner
paragraph.</p>
<p>Note that I’m using <code>:py3</code> for this, so, <code>+python3</code> would be required for this to work. I might port
it to Vimscript one of these days, hopefully.</p>
<h2 id="using-vartabstop-to-line-up">Using <code>vartabstop</code> to Line Up<a class="headerlink" href="#using-vartabstop-to-line-up" title="Permanent link">¶</a></h2>
<p>The moment I learnt about the <code>vartabstop</code> option, I jumped on it right away, considering I worked
with tab separated files a lot. I created the following command that would scan the file’s contents
and set the value of this option such that all the columns would line up perfectly, almost like a
spreadsheet.</p>
<p class="note">The <code>vartabstop</code> option is not available in Neovim, which is one of the reasons I don’t use it yet.
I just got too used to <code>vartabstop</code>.</p>
<div class="hl"><pre class=content><code><span>command TabsLineUp <span class="k">call</span> <span class="p"><</span>SID<span class="p">></span>TabsLineUp<span class="p">()</span>
</span><span><span class="k">fun</span> s:TabsLineUp<span class="p">()</span> abort
</span><span> <span class="k">py3</span> <span class="p"><<</span>EOPYTHON
</span><span>import <span class="k">vim</span>
</span><span>lengths <span class="p">=</span> []
</span><span><span class="k">for</span> parts <span class="k">in</span> <span class="p">(</span><span class="k">l</span>.split<span class="p">(</span><span class="s1">'\t'</span><span class="p">)</span> <span class="k">for</span> <span class="k">l</span> <span class="k">in</span> <span class="k">vim</span>.current.buffer <span class="k">if</span> <span class="s1">'\t'</span> <span class="k">in</span> <span class="k">l</span><span class="p">)</span>:
</span><span> lengths.append<span class="p">(</span>[len<span class="p">(</span><span class="k">c</span><span class="p">)</span> <span class="k">for</span> <span class="k">c</span> <span class="k">in</span> parts]<span class="p">)</span>
</span><span><span class="k">vim</span>.current.buffer.<span class="k">options</span>[<span class="s1">'vartabstop'</span>] <span class="p">=</span> <span class="s1">','</span>.<span class="k">join</span><span class="p">(</span>str<span class="p">(</span>max<span class="p">(</span><span class="k">ls</span><span class="p">)</span> <span class="p">+</span> <span class="m">3</span><span class="p">)</span> <span class="k">for</span> <span class="k">ls</span> <span class="k">in</span> zip<span class="p">(</span>*lengths<span class="p">))</span>
</span><span>EOPYTHON
</span><span><span class="k">endfun</span>
</span></code></pre></div>
<p class="note">Needs <code>+python3</code>.</p>
<p>Here’s a nice GIF showing this off! Note that although it looks like we’re just adding a lot of
white space to align stuff, <em>no new space characters are inserted</em>. The document remains unchanged.
It’s just the display size of tab characters is what we’re changing with <code>vartabstop</code>.</p>
<p class="img"><a href="https://sharats.me/static/vim-tabs-line-up-demo.gif"><img alt="Tabs line up demo" src="https://sharats.me/static/vim-tabs-line-up-demo.gif"></a></p>
<p>Finally, tab separated files are easier to deal with than comma separated files.</p>
<p>Also, if you’re into CSV and tab separated files, I recommend checking out the amazing <a href="https://github.com/chrisbra/csv.vim" rel="noopener noreferrer" target="_blank">csv.vim</a>
plugin. It makes similar use of the <code>vartabstop</code> option.</p>
<h2 id="strip-trailing-spaces">Strip Trailing Spaces<a class="headerlink" href="#strip-trailing-spaces" title="Permanent link">¶</a></h2>
<p>I know trailing whitespace doesn’t bother a lot of people much, but it does upset me. Most of the
solutions I found online to remove trailing whitespace operate on the whole file. I wanted it to
work with the lines over a motion, like inner paragraph etc. Of course, I could just visually select
the text object and then do a <code>:s/\s\+$//</code>, but that’s too much effort!</p>
<div class="hl"><pre class=content><code><span><span class="c">" Strip all trailing spaces in the selection, or over motion.</span>
</span><span><span class="nb">nnoremap</span> <span class="p"><</span><span class="k">silent</span><span class="p">></span> <span class="p"><</span>Leader<span class="p">></span><span class="k">x</span> :<span class="k">set</span> <span class="nb">opfunc</span><span class="p">=</span>StripRight<span class="p"><</span>CR<span class="p">></span><span class="k">g</span>@
</span><span><span class="nb">vnoremap</span> <span class="p"><</span><span class="k">silent</span><span class="p">></span> <span class="p"><</span>Leader<span class="p">></span><span class="k">x</span> :<span class="p"><</span>C<span class="p">-</span><span class="k">u</span><span class="p">></span><span class="k">call</span> StripRight<span class="p">(</span><span class="m">1</span><span class="p">)<</span>CR<span class="p">></span>
</span><span><span class="k">fun</span> StripRight<span class="p">(</span>...<span class="p">)</span> abort
</span><span> <span class="k">let</span> <span class="k">cp</span> <span class="p">=</span> getcurpos<span class="p">()</span>
</span><span> <span class="k">let</span> <span class="k">marks</span> <span class="p">=</span> get<span class="p">(</span><span class="k">a</span>:<span class="m">000</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">)</span> ? <span class="s1">'<>'</span> : <span class="s1">'[]'</span>
</span><span> <span class="k">let</span> [_<span class="p">,</span> l1<span class="p">,</span> c1<span class="p">,</span> _] <span class="p">=</span> getpos<span class="p">(</span><span class="s2">"'"</span> . <span class="k">marks</span>[<span class="m">0</span>]<span class="p">)</span>
</span><span> <span class="k">let</span> [_<span class="p">,</span> l2<span class="p">,</span> c2<span class="p">,</span> _] <span class="p">=</span> getpos<span class="p">(</span><span class="s2">"'"</span> . <span class="k">marks</span>[<span class="m">1</span>]<span class="p">)</span>
</span><span> exe <span class="s1">'keepjumps '</span> . l1 . <span class="s1">','</span> . l2 . <span class="s1">'s/\s\+$//e'</span>
</span><span> <span class="k">call</span> setpos<span class="p">(</span><span class="s1">'.'</span><span class="p">,</span> <span class="k">cp</span><span class="p">)</span>
</span><span><span class="k">endfun</span>
</span></code></pre></div>
<p>The above snippet defines a mapping, <kbd>,x</kbd> which operates on a motion and removes trailing
whitespace. There’s some nice additions to this, in that it works in visual mode as well, and that
the cursor doesn’t move as a result of this operation.</p>
<p>Removing trailing whitespace inside current paragraph is now <kbd>,xip</kbd>!</p>
<h2 id="append-character-over-motion">Append character over motion<a class="headerlink" href="#append-character-over-motion" title="Permanent link">¶</a></h2>
<p>This mapping lets me add a character at the end of all lines over a motion. So, like,
<kbd>ga;ip</kbd> would add a semicolon to every line inside the paragraph.</p>
<p>I use this mostly to add commas or tab characters when working with CSV (or tab-separated files).</p>
<div class="hl"><pre class=content><code><span><span class="c">" Append a letter to all lines in motion.</span>
</span><span><span class="nb">nnoremap</span> <span class="p"><</span><span class="k">silent</span><span class="p">></span> <span class="p"><</span>expr<span class="p">></span> ga <span class="p"><</span>SID<span class="p">></span>AppendToLines<span class="p">(</span><span class="s1">'n'</span><span class="p">)</span>
</span><span>xnoremap <span class="p"><</span><span class="k">silent</span><span class="p">></span> ga :<span class="p"><</span>C<span class="p">-</span><span class="k">u</span><span class="p">></span><span class="k">call</span> <span class="p"><</span>SID<span class="p">></span>AppendToLines<span class="p">(</span>visualmode<span class="p">())<</span>CR<span class="p">></span>
</span><span>
</span><span><span class="k">fun</span> s:AppendToLines<span class="p">(</span><span class="k">mode</span><span class="p">)</span> abort
</span><span> <span class="k">let</span> <span class="k">c</span> <span class="p">=</span> getchar<span class="p">()</span>
</span><span> <span class="k">while</span> <span class="k">c</span> <span class="p">==</span> <span class="s2">"\<CursorHold>"</span> <span class="p">|</span> <span class="k">let</span> <span class="k">c</span> <span class="p">=</span> getchar<span class="p">()</span> <span class="p">|</span> <span class="k">endwhile</span>
</span><span> <span class="k">let</span> <span class="k">g</span>:_append_to_lines <span class="p">=</span> nr2char<span class="p">(</span><span class="k">c</span><span class="p">)</span>
</span><span> <span class="k">if</span> <span class="k">a</span>:<span class="k">mode</span> <span class="p">==</span>? <span class="s1">'n'</span>
</span><span> exe <span class="s1">'set opfunc='</span> . s:SID<span class="p">()</span> . <span class="s1">'AppendToLinesOpFunc'</span>
</span><span> <span class="k">return</span> <span class="s1">'g@'</span>
</span><span> <span class="k">else</span>
</span><span> <span class="k">call</span> s:AppendToLinesOpFunc<span class="p">(</span><span class="s1">'v'</span><span class="p">)</span>
</span><span> <span class="k">endif</span>
</span><span><span class="k">endfun</span>
</span><span>
</span><span><span class="k">fun</span> s:AppendToLinesOpFunc<span class="p">(</span>type<span class="p">)</span> abort
</span><span> <span class="k">let</span> <span class="k">marks</span> <span class="p">=</span> <span class="k">a</span>:type <span class="p">==</span>? <span class="s1">'v'</span> ? <span class="s1">'<>'</span> : <span class="s1">'[]'</span>
</span><span> <span class="k">for</span> <span class="k">l</span> <span class="k">in</span> range<span class="p">(</span>line<span class="p">(</span><span class="s2">"'"</span> . <span class="k">marks</span>[<span class="m">0</span>]<span class="p">),</span> line<span class="p">(</span><span class="s2">"'"</span> . <span class="k">marks</span>[<span class="m">1</span>]<span class="p">))</span>
</span><span> <span class="k">call</span> setline<span class="p">(</span><span class="k">l</span><span class="p">,</span> getline<span class="p">(</span><span class="k">l</span><span class="p">)</span> . <span class="k">g</span>:_append_to_lines<span class="p">)</span>
</span><span> <span class="k">endfor</span>
</span><span> unlet <span class="k">g</span>:_append_to_lines
</span><span><span class="k">endfun</span>
</span></code></pre></div>
<p>This may seem pointless in that, it’s not very hard to do this with visual block mode. Sure. On that
note, even <kbd>A</kbd> is pretty pointless, it can be done with just <kbd>$a</kbd>, right? No. The
point here is not about having a shorter key sequence to do this, but a more semantic one. Just like
<kbd>A</kbd> spells “append at end of line”, to me, <kbd>ga;ip</kbd> spells “adding semicolon to
every line in the paragraph”. Personally, I think better this way.</p>
<h2 id="conclusion">Conclusion<a class="headerlink" href="#conclusion" title="Permanent link">¶</a></h2>
<p>Text objects in Vim (and motions, for the most part) have effectively solved the problem of being
able expressively select a piece of text to work on. However, in my opinion, the kind of work that
can be done on such text is equally (if not more) important. Try to identify what you often do after
selecting text with text objects and see if you can turn it into an operator mapping like those in
this write-up.</p>
<p>This one is shorter than usual and that’s not because of lack of content, it’s more because of
terrible planning on my part. Nevertheless, stay tuned for more in this series!</p>
<p class="note">Read the <a href="../automating-the-vim-workplace-2/">previous article</a> in this series.</p>The Weird `global`2020-03-08T00:00:00+05:302020-03-08T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2020-03-08:/posts/the-weird-global/<p>Python’s <code>global</code> keyword allows us to change the value of module-level variables inside functions.
Sounds so simple and useful, doesn’t it? Well, yeah. I’m going to show you how it can be useful in
the simple sense and situations where it can drive people nuts.</p>
<h2 id="simple-usage">Simple Usage …</h2><p>Python’s <code>global</code> keyword allows us to change the value of module-level variables inside functions.
Sounds so simple and useful, doesn’t it? Well, yeah. I’m going to show you how it can be useful in
the simple sense and situations where it can drive people nuts.</p>
<h2 id="simple-usage">Simple Usage<a class="headerlink" href="#simple-usage" title="Permanent link">¶</a></h2>
<p>Consider the following <code>top.py</code> script. We have a single module-level (aka <code>global</code>) variable here,
and we change its value in the function <code>done</code>.</p>
<div class="hl"><div class=filename><span>top.py</span></div><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span><span>9
</span><span>10
</span><span>11
</span></pre><pre class=content><code><span><span class="n">are_we_done</span> <span class="o">=</span> <span class="kc">False</span>
</span><span>
</span><span>
</span><span><span class="k">def</span> <span class="nf">mark_done</span><span class="p">():</span>
</span><span> <span class="k">global</span> <span class="n">are_we_done</span>
</span><span> <span class="n">are_we_done</span> <span class="o">=</span> <span class="kc">True</span>
</span><span>
</span><span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="s2">"Done?"</span><span class="p">,</span> <span class="n">are_we_done</span><span class="p">)</span>
</span><span><span class="n">mark_done</span><span class="p">()</span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="s2">"Done?"</span><span class="p">,</span> <span class="n">are_we_done</span><span class="p">)</span>
</span></code></pre></div>
<p>Running this, we get the following output:</p>
<div class="hl"><pre class=content><code><span>Done? False
</span><span>Done? True
</span></code></pre></div>
<p>The reason we were able to change the value of the global variable <code>are_we_done</code> from inside the
<code>mark_done</code> function is because we declared it as such on line 5. If that declaration isn’t there,
we’d just be defining a new <em>function level variable</em> called <code>are_we_done</code> inside the <code>mark_done</code>
function. Which is not what we wanted.</p>
<h2 id="refer-directly">Refer Directly<a class="headerlink" href="#refer-directly" title="Permanent link">¶</a></h2>
<p>Note that declaring variables as <code>global</code> is needed only when we’re <em>modifying the value of the
variable</em>. That means if we are only accessing the variable, we don’t need to declare it as
<code>global</code>. This is how capitalized constant variables work in most Python scripts:</p>
<div class="hl"><pre class=content><code><span><span class="n">CURRENT_PLANET</span> <span class="o">=</span> <span class="s2">"Earth"</span>
</span><span>
</span><span>
</span><span><span class="k">def</span> <span class="nf">get_moon_count</span><span class="p">():</span>
</span><span> <span class="k">if</span> <span class="n">CURRENT_PLANET</span> <span class="o">==</span> <span class="s2">"Earth"</span><span class="p">:</span>
</span><span> <span class="k">return</span> <span class="mi">1</span>
</span><span> <span class="k">else</span><span class="p">:</span>
</span><span> <span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s2">"No idea!"</span><span class="p">)</span>
</span><span>
</span><span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="n">get_moon_count</span><span class="p">())</span>
</span></code></pre></div>
<p>This, of course, prints out <code>1</code>. Here, we are using the <code>CURRENT_PLANET</code> global variable inside the
function without declaring it as global. Accessing doesn’t <em>require</em> explicitly declaring as
<code>global</code>.</p>
<h3 id="modifying-the-referred-object">Modifying the Referred Object<a class="headerlink" href="#modifying-the-referred-object" title="Permanent link">¶</a></h3>
<p>A small note on the terms we’ve been using here. Accessing doesn’t require <code>global</code> declaration, but
modifying does. Now look at the following code snippet:</p>
<div class="hl"><pre class=content><code><span><span class="n">CALLS</span> <span class="o">=</span> <span class="p">[]</span>
</span><span>
</span><span>
</span><span><span class="k">def</span> <span class="nf">record_call</span><span class="p">(</span><span class="n">phone_number</span><span class="p">):</span>
</span><span> <span class="n">CALLS</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">phone_number</span><span class="p">)</span>
</span><span>
</span><span>
</span><span><span class="n">record_call</span><span class="p">(</span><span class="s2">"123-45-678"</span><span class="p">)</span>
</span><span><span class="n">record_call</span><span class="p">(</span><span class="s2">"987-65-432"</span><span class="p">)</span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="n">CALLS</span><span class="p">)</span>
</span></code></pre></div>
<p>Here, since we are appending to the <code>CALLS</code> list, is that considered modifying the global variable?
The answer is <em>no</em>. We are merely <em>accessing</em> the <code>CALLS</code> variable’s value, which happens to be a
<code>list</code>, on which we call the <code>.append</code> method. There’s no modifying going on here so far. The
<code>.append</code> method, however, will change the <em>state</em> of the <code>list</code> object. But for the purposes of
using the <code>CALLS</code> variable here, we are only accessing it. So, we don’t need to declare it as
<code>global</code>.</p>
<p>So what <em>does</em> modifying mean? Simply put, if you want to reassign a global variable, it’s
considered as modifying.</p>
<h2 id="assigning-without-declaring">Assigning without Declaring<a class="headerlink" href="#assigning-without-declaring" title="Permanent link">¶</a></h2>
<p>This behaviour of global variables causes some slightly unintuitive situations. For example,
consider the following piece of code:</p>
<div class="hl"><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span></pre><pre class=content><code><span><span class="n">is_server_up</span> <span class="o">=</span> <span class="kc">False</span>
</span><span>
</span><span>
</span><span><span class="k">def</span> <span class="nf">mark_server_up</span><span class="p">():</span>
</span><span> <span class="nb">print</span><span class="p">(</span><span class="n">is_server_up</span><span class="p">)</span>
</span><span>
</span><span>
</span><span><span class="n">mark_server_up</span><span class="p">()</span>
</span></code></pre></div>
<p>In this script, we are using the global variable <code>is_server_up</code> on line 5, without declaring it as
<code>global</code>, and it works fine. Now, we add another line to this function:</p>
<div class="hl"><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span><span>9
</span></pre><pre class=content><code><span><span class="n">is_server_up</span> <span class="o">=</span> <span class="kc">False</span>
</span><span>
</span><span>
</span><span><span class="k">def</span> <span class="nf">mark_server_up</span><span class="p">():</span>
</span><span> <span class="nb">print</span><span class="p">(</span><span class="n">is_server_up</span><span class="p">)</span>
</span><span> <span class="n">is_server_up</span> <span class="o">=</span> <span class="kc">True</span>
</span><span>
</span><span>
</span><span><span class="n">mark_server_up</span><span class="p">()</span>
</span></code></pre></div>
<p>If we run this script, we get the following error:</p>
<div class="hl"><pre class=content><code><span>Traceback (most recent call last):
</span><span> File "/check.py", line 9, in <module>
</span><span> mark_server_up()
</span><span> File "/check.py", line 5, in mark_server_up
</span><span> print(is_server_up)
</span><span>UnboundLocalError: local variable 'is_server_up' referenced before assignment
</span></code></pre></div>
<p>Okay, we kind of expected an error because we are trying to modify a global variable without
declaring it. But note that the error comes from <strong>line 5</strong>, not on <strong>line 6</strong>, where we are
modifying the variable. The error message gives a hint on what’s happening.</p>
<div class="hl"><pre class=content><code><span>local variable 'is_server_up' referenced before assignment
</span></code></pre></div>
<p>Since we didn’t declare <code>is_server_up</code> as global, and since we are setting a value to
<code>is_server_up</code>, Python decided that we want a local variable in our function with the same name.
With that understanding, it looks like we are referencing the <code>is_server_up</code> <em>local variable</em> before
assigning a value to it. That’s the error we see here.</p>
<h2 id="conclusion">Conclusion<a class="headerlink" href="#conclusion" title="Permanent link">¶</a></h2>
<p>Global variables have their place, but, if it’s not for constant-like values, I’d recommend against
using global variables at all. It might make sense for small one-off scripts, and when it does, keep
the above small details in mind.</p>The Magic of AutoHotkey2020-03-01T00:00:00+05:302020-03-01T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2020-03-01:/posts/the-magic-of-autohotkey/<p>For the past several years, my primary work station has been Windows 7. After the initial swearing
at how things work differently (coming from Linux), I got used to it and started to really like it,
in some ways. A big part of the reason for that on Windows is …</p><p>For the past several years, my primary work station has been Windows 7. After the initial swearing
at how things work differently (coming from Linux), I got used to it and started to really like it,
in some ways. A big part of the reason for that on Windows is <a href="https://www.autohotkey.com/" rel="noopener noreferrer" target="_blank">AutoHotkey</a>.</p>
<p>I will document my automations and experiences over the years in this two-part article series.</p>
<ol>
<li>Part 1 (this article).</li>
<li><a href="../the-magic-of-autohotkey-2/">Part 2</a></li>
</ol>
<div class="toc"><span class="toctitle">Table of Contents</span><ul>
<li><a href="#the-setup">The Setup</a></li>
<li><a href="#the-common-magic">The Common Magic</a><ul>
<li><a href="#reload-autohotkey-script">Reload AutoHotkey Script</a></li>
<li><a href="#open-the-toolbar-calendar">Open the Toolbar Calendar</a></li>
<li><a href="#hide-the-show-desktop-button">Hide the Show Desktop Button</a></li>
<li><a href="#type-clipboard-contents">Type Clipboard Contents</a></li>
</ul>
</li>
<li><a href="#close-on-escape-key">Close on Escape Key</a></li>
<li><a href="#the-caps-lock-story">The Caps Lock Story</a></li>
<li><a href="#inserting-snippets">Inserting Snippets</a></li>
<li><a href="#window-watcher">Window Watcher</a></li>
<li><a href="#mess-with-images-in-clipboard">Mess with Images in Clipboard</a></li>
<li><a href="#periodic-time-display">Periodic Time Display</a></li>
<li><a href="#vim-keys-for-sumatra-pdf">Vim Keys for Sumatra PDF</a></li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
</div>
<p><a href="https://www.autohotkey.com/" rel="noopener noreferrer" target="_blank">AutoHotkey</a> is an open-source programming language for Windows, that lends itself extremely well
to tasks within the GUI scripting and automation domains. The hotkey functionality is particularly
good, something I haven’t found in any other general purpose programming language (<a href="https://www.autoitscript.com/site/" rel="noopener noreferrer" target="_blank">AutoIt</a> most
likely comes close, but I’ve never tried it so can’t speak for it).</p>
<p>The language itself may seem a bit flaky around the syntax and some of its constructs, but once we
get used to them, we can leverage the powerful engine underneath it. That, combined with the
well-written documentation, makes AutoHotkey a must-have tool for any Windows power user.</p>
<p>Some hotkeys I use (few that I can’t show off here) are so well integrated into my daily
workflow, that my fingers just flow on the keyboard and things happen on screen that are hard to
follow for others.</p>
<blockquote>
<p>Any sufficiently advanced technology is indistinguishable from magic.</p>
<p>– <em>Clarke’s Third Law, 1973</em></p>
</blockquote>
<p>In these articles, I’ll share some of the hotkeys I use, how I came about them and how they improve
my workflow. It is not a beginner’s AutoHotkey tutorial, that would be the official documentation or
the many other resources available online.</p>
<p class="note">A lot of the stuff in this article is made possible by a lot of help from all over the internet, and
especially the AutoHotkey forums. Due to most of it being at least a few years old, I don’t have the
exact source links. So, thank you everyone!</p>
<h2 id="the-setup">The Setup<a class="headerlink" href="#the-setup" title="Permanent link">¶</a></h2>
<p>I usually have one AutoHotkey script running at all times (called <code>master.ahk</code>). I <code>#Include</code> other
scripts into this so that all my hotkeys and automations aren’t just dumped into one large
<code>master.ahk</code>. It starts off with the following:</p>
<div class="hl"><pre class=content><code><span><span class="nb">#NoEnv</span>
</span><span><span class="nb">#SingleInstance</span> <span class="n">force</span>
</span><span><span class="n">#Warn</span>
</span><span>
</span><span><span class="nb">SendMode</span> <span class="n">Input</span>
</span><span><span class="nb">SetWorkingDir</span> <span class="nv">%A_ScriptDir%</span><span class="c1"> ; Default in autohotkey v2.</span>
</span><span><span class="nb">AutoTrim</span><span class="p">,</span> <span class="n">Off</span><span class="c1"> ; Default in autohotkey v2.</span>
</span><span><span class="nb">SetTitleMatchMode</span> <span class="n">RegEx</span>
</span><span><span class="nb">SetNumlockState</span><span class="p">,</span> <span class="n">AlwaysOn</span>
</span><span>
</span><span><span class="nb">EnvGet</span><span class="p">,</span> <span class="n">homedir</span><span class="p">,</span> <span class="n">USERPROFILE</span>
</span></code></pre></div>
<p>Most of this I learned to be a best practice from the documentation and from the forums. Please look
up the documentation for these individual directives, I won’t repeat them here.</p>
<h2 id="the-common-magic">The Common Magic<a class="headerlink" href="#the-common-magic" title="Permanent link">¶</a></h2>
<p>These are essentials that are general enough that I believe everyone using AutoHotkey should have.
Let’s quickly run these down, so we can move on to more exciting stuff.</p>
<h3 id="reload-autohotkey-script">Reload AutoHotkey Script<a class="headerlink" href="#reload-autohotkey-script" title="Permanent link">¶</a></h3>
<p>The script <code>master.ahk</code> that is running in the background at all times contains some of my hotkeys
and the rest are <code>#Include</code>-ed from other AutoHotkey scripts. I include the below snippet in this
script and when I hit <kbd>#+r</kbd>, the changes in <code>master.ahk</code> and any included scripts are
reloaded.</p>
<div class="hl"><pre class=content><code><span><span class="nl">#+r::</span><span class="n">Reload</span>
</span></code></pre></div>
<p class="note">All script snippets discussed here, if and when added to your master script, would start working
fine with a Reload like above. No need to quit it and start again.</p>
<h3 id="open-the-toolbar-calendar">Open the Toolbar Calendar<a class="headerlink" href="#open-the-toolbar-calendar" title="Permanent link">¶</a></h3>
<p>It’s really sad that there’s no default hotkey to have a calendar pop open on Windows. Clicking on
the time displayed at the right of the toolbar does show a handy calendar, but there’s no hotkey for
it. The following solves this exact problem. We use the <kbd>#b</kbd> hotkey which gives focus to
the system tray. Then we navigate to the time and hit the <kbd>{Enter}</kbd> key.</p>
<div class="hl"><pre class=content><code><span><span class="nl">#i::</span><span class="n">Send</span> <span class="n">#b</span><span class="p">{</span><span class="n">Left</span><span class="p">}{</span><span class="n">Enter</span><span class="p">}</span>
</span></code></pre></div>
<p>There’s a problem with this though. Once the calendar opens up, and we close it by hitting the
<kbd>Escape</kbd> key, the focus is not returned to the window that had focus originally. The
workaround for me has been to do <kbd>Alt+Tab</kbd> a couple of times, and we’re back to work.</p>
<p class="note">It’s still arguable how useful this solution is. The pop-up Calendar has very limited functionality.
The most annoying this is probably that I spend a few seconds selecting the month I want to look at
and accidentally click on another window and that Calendar is gone! After a lot of swearing, I
attempted to solve this problem and built <a href="https://justacalendar.app" rel="noopener noreferrer" target="_blank">justacalendar.app</a>. It’s
super-quick, no-login-required, light-weight, just a calendar to look at, and mark dates to top. Do
check it out! Thanks.</p>
<h3 id="hide-the-show-desktop-button">Hide the Show Desktop Button<a class="headerlink" href="#hide-the-show-desktop-button" title="Permanent link">¶</a></h3>
<p>Every time my mouse moves to the bottom right corner, all my windows go transparent, and <em>almost</em>
reduce me to swearing again. Now, I know we can turn this behaviour off by disabling Aero or some
other setting and I can even agree that this feature can be useful. But to me, firstly, I hardly
keep anything on my Desktop, so its mere existence is quite useless to me. Secondly, even if I
wanted to look at the desktop, it’s a quick <kbd>#d</kbd> away, which is much faster considering my
fingers are almost always on the keyboard.</p>
<p>So I decided to hide the “Show Desktop” button with the following snippet:</p>
<div class="hl"><pre class=content><code><span><span class="nb">Control</span><span class="p">,</span> <span class="n">Hide</span><span class="p">,</span> <span class="p">,</span> <span class="n">TrayShowDesktopButtonWClass1</span>
</span><span> <span class="p">,</span> <span class="n">ahk_class</span> <span class="n">Shell_TrayWnd</span> <span class="n">ahk_exe</span> <span class="n">explorer</span><span class="o">.</span><span class="n">exe</span>
</span></code></pre></div>
<p>This doesn’t reclaim the space occupied by the button, but the button disappears and the above
problem goes away, so, I’m not complaining.</p>
<h3 id="type-clipboard-contents">Type Clipboard Contents<a class="headerlink" href="#type-clipboard-contents" title="Permanent link">¶</a></h3>
<p>Remember how some websites (especially bank websites) disallow pasting values into inputs. This is
extremely annoying when using a password manager or when I want to just paste something. I’ve tried
several solutions to this, and the current answer I have with AutoHotkey has served me the best.</p>
<div class="hl"><pre class=content><code><span><span class="nl">#v::</span><span class="n">SendInput</span><span class="p">,</span> <span class="p">{</span><span class="n">Raw</span><span class="p">}</span><span class="nv">%Clipboard%</span>
</span></code></pre></div>
<p>The idea is that instead of sending a paste operation, we have AutoHotkey <em>type out the contents of
the clipboard</em>. This has the additional benefit of stripping any formatting in the text in the
clipboard, if for instance, we’ve copied something from a website or a Word document with heavy
formatting.</p>
<h2 id="close-on-escape-key">Close on Escape Key<a class="headerlink" href="#close-on-escape-key" title="Permanent link">¶</a></h2>
<p>There are some windows that I’d love to close with just a tap on the <kbd>Escape</kbd> key, but they
don’t. A few examples of where I (instinctively) expect this are the photo viewer, font viewer, the
playlist in VLC etc. Then there’s another set of windows that I found myself trying to close by
hitting <kbd>^w</kbd> (this intuition likely comes from its behaviour in Firefox and Chrome).
Either way, I needed these keys to act the way I was expecting them to.</p>
<p>There’s two parts to the solution to this. First, we define the hotkeys to close the windows on
<strong>window groups</strong>:</p>
<div class="hl"><pre class=content><code><span><span class="nb">#IfWinActive</span> <span class="n">ahk_group</span> <span class="n">CloseOnEsc</span>
</span><span><span class="nl">Escape::</span><span class="n">PostMessage</span> <span class="mh">0x112</span><span class="p">,</span> <span class="mh">0xF060</span>
</span><span><span class="nb">#IfWinActive</span> <span class="n">ahk_group</span> <span class="n">CloseOnCW</span>
</span><span><span class="o">^</span><span class="n">w</span><span class="o">::</span><span class="n">PostMessage</span> <span class="mh">0x112</span><span class="p">,</span> <span class="mh">0xF060</span>
</span><span><span class="nb">#IfWinActive</span>
</span></code></pre></div>
<p>What we’re doing here is define two hotkeys. First, for all windows in the group called
<code>CloseOnEsc</code>, define the hotkey <kbd>Escape</kbd> to close the window (the <code>PostMessage</code> part, which
we’ll get to in a bit). Second, a similar hotkey on <kbd>^w</kbd> for windows in the group
<code>CloseOnCW</code>.</p>
<p>Now, you might’ve noticed that we don’t use the <code>WinClose</code> command to close the window. The reason
is that for some applications (such as Lync), the <code>WinClose</code> command <em>quits</em> the application instead
of just sending it back to the tray. The <code>PostMessage</code> command above would behave exactly like
hitting the red close button at the top right of the window.</p>
<p>In the second part of this exercise, we add windows to the groups:</p>
<div class="hl"><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span><span>9
</span><span>10
</span><span>11
</span></pre><pre class=content><code><span><span class="c1">; Windows that should just disappear on ESC, but don't already.</span>
</span><span><span class="nb">GroupAdd</span><span class="p">,</span> <span class="n">CloseOnEsc</span><span class="p">,</span> <span class="n">ahk_class</span> <span class="n">Photo_Lightweight_Viewer</span>
</span><span><span class="nb">GroupAdd</span><span class="p">,</span> <span class="n">CloseOnEsc</span><span class="p">,</span> <span class="n">ahk_class</span> <span class="n">ConsoleWindowClass</span>
</span><span><span class="nb">GroupAdd</span><span class="p">,</span> <span class="n">CloseOnEsc</span><span class="p">,</span> <span class="n">Skype</span> <span class="n">for</span> <span class="n">Business</span>
</span><span><span class="nb">GroupAdd</span><span class="p">,</span> <span class="n">CloseOnEsc</span><span class="p">,</span> <span class="n">Vivaldi</span> <span class="n">Settings</span> <span class="n">ahk_exe</span> <span class="n">vivaldi</span><span class="o">.</span><span class="n">exe</span>
</span><span><span class="nb">GroupAdd</span><span class="p">,</span> <span class="n">CloseOnEsc</span><span class="p">,</span> <span class="n">ahk_class</span> <span class="n">FontViewWClass</span> <span class="n">ahk_exe</span> <span class="n">fontview</span><span class="o">.</span><span class="n">exe</span>
</span><span><span class="nb">GroupAdd</span><span class="p">,</span> <span class="n">CloseOnEsc</span><span class="p">,</span> <span class="n">Playlist</span> <span class="n">ahk_exe</span> <span class="n">vlc</span><span class="o">.</span><span class="n">exe</span>
</span><span>
</span><span><span class="c1">; Windows that should close with C-w.</span>
</span><span><span class="nb">GroupAdd</span><span class="p">,</span> <span class="n">CloseOnCW</span><span class="p">,</span> <span class="n">ahk_class</span> <span class="n">Notepad</span> <span class="n">ahk_exe</span> <span class="n">notepad</span><span class="o">.</span><span class="n">exe</span>
</span><span><span class="nb">GroupAdd</span><span class="p">,</span> <span class="n">CloseOnCW</span><span class="p">,</span> <span class="n">ahk_class</span> <span class="n">FM</span> <span class="n">ahk_exe</span> <span class="mi">7</span><span class="n">zFM</span><span class="o">.</span><span class="n">exe</span>
</span></code></pre></div>
<p>This should be fairly self-explanatory. We add certain windows (as identified by <code>WinTitle</code> style
filters) and add them to the two groups, using the <code>GroupAdd</code> command.</p>
<p>There’s one special case here. The stock Windows Calculator app. This one clears the display on
hitting <kbd>Escape</kbd> key. But I wanted it to close on <kbd>Escape</kbd> <strong>if</strong> the display is
already cleared.</p>
<p>So, instead of including Calculator in the above group(s), I use the following snippet to handle
this special case.</p>
<div class="hl"><pre class=content><code><span><span class="nb">#IfWinActive</span> <span class="n">ahk_class</span> <span class="n">CalcFrame</span>
</span><span><span class="nl">$Escape::</span>
</span><span><span class="n">CloseOrClearCalculator</span><span class="p">()</span> <span class="p">{</span>
</span><span> <span class="nb">ControlGetText</span><span class="p">,</span> <span class="n">display</span><span class="p">,</span> <span class="n">Static4</span>
</span><span> <span class="n">if</span> <span class="p">(</span><span class="n">display</span> <span class="o">==</span> <span class="s">"0"</span><span class="p">)</span>
</span><span> <span class="nb">WinClose</span>
</span><span> <span class="nb">else</span>
</span><span> <span class="nb">SendInput</span><span class="p">,</span> <span class="p">{</span><span class="n">Escape</span><span class="p">}</span>
</span><span><span class="p">}</span>
</span><span><span class="nb">#IfWinActive</span>
</span></code></pre></div>
<p>This will close the Calculator if the display is already <code>"0"</code>, but passes the <kbd>Escape</kbd> key
otherwise.</p>
<h2 id="the-caps-lock-story">The Caps Lock Story<a class="headerlink" href="#the-caps-lock-story" title="Permanent link">¶</a></h2>
<p>I use <a href="https://github.com/randyrants/sharpkeys" rel="noopener noreferrer" target="_blank">SharpKeys</a> to turn my <kbd>Caps Lock</kbd> key into an additional <kbd>Ctrl</kbd> Key. This
works wonders considering that the <kbd>Ctrl</kbd> key is used a lot more often than the <kbd>Caps
Lock</kbd>, but the <kbd>Caps Lock</kbd> key is a lot easier to hit than any of the <kbd>Ctrl</kbd>
keys.</p>
<p>If you’re wondering why I don’t do this with AutoHotkey, the reason is that if I did it with
AutoHotkey, it would be active <em>only when the script is running</em>. Which means the remapping isn’t
active in the lock screen (where I hit <kbd>Ctrl+A</kbd> often). But since SharpKeys modifies the
registry to achieve what it does, the remapping works even in the lock screen.</p>
<p>Yet, sometimes I miss the original functionality of the <kbd>Caps Lock</kbd> key. So I created the
following hotkey for <kbd>#q</kbd> which will turn on Caps Lock mode, and show an annoying
always-on-top splash window alerting me to that fact. To turn it back off, it’s <kbd>#q</kbd> again.</p>
<div class="hl"><pre class=content><code><span><span class="nl">#q::</span>
</span><span><span class="n">ToggleCapsLock</span><span class="p">()</span> <span class="p">{</span>
</span><span> <span class="nb">if </span><span class="nf">GetKeyState</span><span class="p">(</span><span class="s">"Capslock"</span><span class="p">,</span> <span class="s">"T"</span><span class="p">)</span> <span class="p">{</span>
</span><span> <span class="nb">SetCapsLockState</span><span class="p">,</span> <span class="n">Off</span>
</span><span> <span class="nb">SplashTextOff</span>
</span><span> <span class="p">}</span> <span class="n">else</span> <span class="p">{</span>
</span><span> <span class="nb">SetCapsLockState</span><span class="p">,</span> <span class="n">On</span>
</span><span> <span class="nb">SplashTextOn</span><span class="p">,</span> <span class="mi">300</span><span class="p">,</span> <span class="p">,</span> <span class="o"><<</span> <span class="n">CAPS</span> <span class="n">LOCK</span> <span class="n">ON</span> <span class="o">>></span> <span class="p">(</span><span class="n">Win</span><span class="o">+</span><span class="n">q</span> <span class="n">to</span> <span class="n">turn</span> <span class="n">off</span><span class="p">)</span>
</span><span> <span class="nb">WinSet</span><span class="p">,</span> <span class="n">Transparent</span><span class="p">,</span> <span class="mi">200</span><span class="p">,</span> <span class="o"><<</span> <span class="n">CAPS</span> <span class="n">LOCK</span> <span class="n">ON</span> <span class="o">>></span>
</span><span> <span class="p">}</span>
</span><span><span class="p">}</span>
</span></code></pre></div>
<p>This actually works surprisingly well. I use it more often than I like to admit. It feels better
than using the original <kbd>Caps Lock</kbd> key, because I get an (hard-to-ignore) overlay that
alerts me that Caps Lock is turned on.</p>
<h2 id="inserting-snippets">Inserting Snippets<a class="headerlink" href="#inserting-snippets" title="Permanent link">¶</a></h2>
<p>Inserting snippets is an idea where a long and often used string is inserted by a rather short
sequence of keys. In AutoHotkey, this is <em>usually</em> done using <a href="https://www.autohotkey.com/docs/Hotstrings.htm" rel="noopener noreferrer" target="_blank">hotstrings</a>. Hotstrings work <em>okay</em>
for this actually, but they don’t work on every application. For me particularly, I needed them to
be working with GVim (which is where I write most of my prose), which they weren’t. So, with a lot
of help from the Internet, I came up with a solution.</p>
<p>Instead of hotstrings, I’ll use a hotkey that summons an OSD (on-screen-display) with a list of keys
and their expansions. When this window is focused, I can hit one of those keys and the windows is
immediately closed and the corresponding expansion is typed out. This has been working unchanged for
over four years for me and has never failed me.</p>
<div class="hl"><input type=checkbox id=co-8><label for=co-8><span class='btn show-full-code-btn'>Show remaining 9 lines</span></label><div class=filename><span>snippets.ahk</span></div><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span><span>9
</span><span>10
</span><span>11
</span><span>12
</span><span>13
</span><span>14
</span><span>15
</span><span>16
</span><span>17
</span><span>18
</span><span>19
</span><span>20
</span><span class=collapse>21
</span><span class=collapse>22
</span><span class=collapse>23
</span><span class=collapse>24
</span><span class=collapse>25
</span><span class=collapse>26
</span><span class=collapse>27
</span><span class=collapse>28
</span><span class=collapse>29
</span></pre><pre class=content><code><span><span class="n">SnippetsInit</span><span class="p">()</span> <span class="p">{</span>
</span><span> <span class="nb">Gui</span><span class="p">,</span> <span class="n">Snips</span><span class="o">:</span> <span class="n">Default</span>
</span><span> <span class="nb">Gui</span><span class="p">,</span> <span class="n">Font</span><span class="p">,</span> <span class="n">s18</span> <span class="n">q5</span><span class="p">,</span> <span class="n">Consolas</span>
</span><span> <span class="nb">Gui</span><span class="p">,</span> <span class="n">Color</span><span class="p">,</span> <span class="n">FF0000</span>
</span><span> <span class="nb">Gui</span><span class="p">,</span> <span class="n">Margin</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">6</span>
</span><span> <span class="nb">Gui</span><span class="p">,</span> <span class="o">+</span><span class="n">AlwaysOnTop</span> <span class="o">+</span><span class="n">Owner</span> <span class="o">+</span><span class="n">ToolWindow</span> <span class="o">-</span><span class="n">Caption</span> <span class="o">+</span><span class="n">HwndSnippetsHwnd</span>
</span><span> <span class="nb">Gui</span><span class="p">,</span> <span class="n">Add</span><span class="p">,</span> <span class="n">ListView</span><span class="p">,</span> <span class="n">r8</span> <span class="n">w900</span><span class="p">,</span> <span class="n">Hotkey</span><span class="o">|</span><span class="n">Text</span>
</span><span>
</span><span> <span class="nb">IniRead</span><span class="p">,</span> <span class="n">configText</span><span class="p">,</span> <span class="n">snippets</span><span class="o">.</span><span class="n">ini</span><span class="p">,</span> <span class="n">master</span>
</span><span> <span class="nb">Loop</span><span class="p">,</span> <span class="n">Parse</span><span class="p">,</span> <span class="n">configText</span><span class="p">,</span> <span class="se">`n</span><span class="p">,</span> <span class="se">`r</span>
</span><span> <span class="p">{</span>
</span><span> <span class="n">parts</span> <span class="o">:=</span> <span class="n">StrSplit</span><span class="p">(</span><span class="nv">A_LoopField</span><span class="p">,</span> <span class="s">"="</span><span class="p">,</span> <span class="s">" `t"</span><span class="p">)</span>
</span><span> <span class="nf">LV_Add</span><span class="p">(</span><span class="s">""</span><span class="p">,</span> <span class="n">parts</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">parts</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span>
</span><span> <span class="p">}</span>
</span><span><span class="p">}</span>
</span><span>
</span><span><span class="n">SnippetsShow</span><span class="p">()</span> <span class="p">{</span>
</span><span> <span class="nb">global</span> <span class="n">SnippetsMap</span>
</span><span> <span class="nb">Gui</span><span class="p">,</span> <span class="n">Snips</span><span class="o">:</span> <span class="n">Show</span><span class="p">,</span> <span class="n">NoActivate</span>
</span><span> <span class="nb">Input</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">L1</span> <span class="n">T3</span>
</span><span class=collapse> <span class="nb">Gui</span><span class="p">,</span> <span class="n">Snips</span><span class="o">:</span> <span class="n">Hide</span>
</span><span class=collapse> <span class="n">if</span> <span class="p">(</span><span class="nv">ErrorLevel</span> <span class="o">!=</span> <span class="s">"Timeout"</span><span class="p">)</span> <span class="p">{</span>
</span><span class=collapse> <span class="nb">IniRead</span><span class="p">,</span> <span class="n">value</span><span class="p">,</span> <span class="n">snippets</span><span class="o">.</span><span class="n">ini</span><span class="p">,</span> <span class="n">master</span><span class="p">,</span> <span class="nv">%key%</span><span class="p">,</span> <span class="n">__SNIPPETS_KEY_NOT_FOUND__</span>
</span><span class=collapse> <span class="n">if</span> <span class="p">(</span><span class="n">value</span> <span class="o">!=</span> <span class="s">"__SNIPPETS_KEY_NOT_FOUND__"</span><span class="p">)</span>
</span><span class=collapse> <span class="nb">SendInput</span><span class="p">,</span> <span class="nv">%value%</span>
</span><span class=collapse> <span class="nb">else</span>
</span><span class=collapse> <span class="nb">MsgBox</span><span class="p">,</span> <span class="n">No</span> <span class="n">snippet</span> <span class="n">found</span> <span class="n">for</span> <span class="nv">%key%</span><span class="o">.</span>
</span><span class=collapse> <span class="p">}</span>
</span><span class=collapse><span class="p">}</span>
</span></code></pre></div>
<p>I have the above in a module called <code>snippets.ahk</code>, which I include in my master script. To use,
first, I need a <code>snippets.ini</code> file in the same directory with expansions. I have things like the
following:</p>
<div class="hl"><pre class=content><code><span><span class="k">[master]</span>
</span><span><span class="na">u</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">sharat87</span>
</span><span><span class="na">m</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">yeahhereismyaddress@gmail.com</span>
</span><span><span class="na">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">{+}91 AND MY PHONE NUMBER</span>
</span><span><span class="na">s</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">https://sharats.me/</span>
</span></code></pre></div>
<p>There’s more snippets on my system, this is just a preview, of course, duh!</p>
<p>The next step is to include this module in our master script:</p>
<div class="hl"><pre class=content><code><span><span class="nb">#Include</span> <span class="n">snippets</span><span class="o">.</span><span class="n">ahk</span>
</span><span><span class="n">SnippetsInit</span><span class="p">()</span>
</span></code></pre></div>
<p>Finally, we define a hotkey to summon the snippets window. I use <kbd>^;</kbd>.</p>
<div class="hl"><pre class=content><code><span><span class="o">^</span><span class="p">;</span><span class="o">::</span><span class="n">SnippetsShow</span><span class="p">()</span>
</span></code></pre></div>
<p>That’s it! Here it is in action:</p>
<p class="img"><a href="https://sharats.me/static/autohotkey-snippets.gif"><img alt="Snippets tool demo" src="https://sharats.me/static/autohotkey-snippets.gif"></a></p>
<h2 id="window-watcher">Window Watcher<a class="headerlink" href="#window-watcher" title="Permanent link">¶</a></h2>
<p>My window watcher module (written as a <code>window-watcher.ahk</code>) lets me define actions to be taken when
new windows with certain properties show up.</p>
<p>For example, I want all command line windows to always be moved to the top right corner or the
screen. As another example, there’s some windows that open up with a window size equal to the whole
screen, but are not maximized. This one is particularly annoying since I have a habit of throwing my
mouse to the top right corner and clicking to close the window. But since this window is not
maximized, I end up accidentally closing the window behind. So, I want such windows to be maximized
as soon as they open.</p>
<p>To address this, I have a <code>window-watcher.ahk</code> module that defines the logic of constantly polling
the visible windows and detecting if anything is opened or closed. This module defines the function
<code>WindowWatcherInit</code> (among others), which needs to be called once to initialize the polling timer.</p>
<div class="hl"><input type=checkbox id=co-9><label for=co-9><span class='btn show-full-code-btn'>Show remaining 17 lines</span></label><div class=filename><span>window-watchers.ahk</span></div><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span><span>9
</span><span>10
</span><span>11
</span><span>12
</span><span>13
</span><span>14
</span><span>15
</span><span>16
</span><span>17
</span><span>18
</span><span>19
</span><span>20
</span><span class=collapse>21
</span><span class=collapse>22
</span><span class=collapse>23
</span><span class=collapse>24
</span><span class=collapse>25
</span><span class=collapse>26
</span><span class=collapse>27
</span><span class=collapse>28
</span><span class=collapse>29
</span><span class=collapse>30
</span><span class=collapse>31
</span><span class=collapse>32
</span><span class=collapse>33
</span><span class=collapse>34
</span><span class=collapse>35
</span><span class=collapse>36
</span><span class=collapse>37
</span></pre><pre class=content><code><span><span class="n">WindowWatcherInit</span><span class="p">()</span> <span class="p">{</span>
</span><span> <span class="nb">static</span> <span class="n">initDone</span> <span class="o">:=</span> <span class="nv">false</span>
</span><span>
</span><span> <span class="n">if</span> <span class="p">(</span><span class="n">initDone</span><span class="p">)</span>
</span><span> <span class="nb">return</span>
</span><span> <span class="n">initDone</span> <span class="o">:=</span> <span class="nv">true</span>
</span><span>
</span><span> <span class="nb">SetTimer</span><span class="p">,</span> <span class="n">WindowWatcherPollForNewWindows</span>
</span><span><span class="p">}</span>
</span><span>
</span><span><span class="n">WindowWatcherTrigger</span><span class="p">(</span><span class="n">wParam</span><span class="p">,</span> <span class="n">hwnd</span><span class="p">)</span> <span class="p">{</span>
</span><span> <span class="n">if</span> <span class="p">(</span><span class="n">wParam</span> <span class="o">==</span> <span class="s">"Created"</span><span class="p">)</span> <span class="p">{</span>
</span><span> <span class="n">OnWindowCreated</span><span class="p">(</span><span class="n">hwnd</span><span class="p">)</span>
</span><span><span class="c1"> ; } else if (wParam == "Destroyed") {</span>
</span><span> <span class="p">}</span>
</span><span><span class="p">}</span>
</span><span>
</span><span><span class="n">WindowWatcherPollForNewWindows</span><span class="p">()</span> <span class="p">{</span>
</span><span> <span class="nb">static</span> <span class="n">windows</span> <span class="o">:=</span> <span class="s">""</span>
</span><span> <span class="nb">WinGet</span><span class="p">,</span> <span class="n">wins</span><span class="p">,</span> <span class="n">List</span><span class="p">,</span> <span class="p">,</span> <span class="p">,</span> <span class="p">,</span>
</span><span class=collapse> <span class="n">newWindows</span> <span class="o">:=</span> <span class="nf">Object</span><span class="p">()</span>
</span><span class=collapse>
</span><span class=collapse> <span class="nb">Loop</span><span class="p">,</span> <span class="nv">%wins%</span>
</span><span class=collapse> <span class="p">{</span>
</span><span class=collapse> <span class="n">this_id</span> <span class="o">:=</span> <span class="n">wins</span><span class="nv">%A_Index%</span>
</span><span class=collapse> <span class="n">newWindows</span><span class="p">[</span><span class="n">this_id</span><span class="p">]</span> <span class="o">:=</span> <span class="mi">1</span>
</span><span class=collapse> <span class="n">if</span> <span class="p">(</span><span class="n">windows</span> <span class="o">&&</span> <span class="o">!</span><span class="n">windows</span><span class="p">[</span><span class="n">this_id</span><span class="p">])</span>
</span><span class=collapse> <span class="n">WindowWatcherTrigger</span><span class="p">(</span><span class="s">"Created"</span><span class="p">,</span> <span class="n">this_id</span><span class="p">)</span>
</span><span class=collapse> <span class="p">}</span>
</span><span class=collapse>
</span><span class=collapse> <span class="n">for</span> <span class="n">wid</span><span class="p">,</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">windows</span> <span class="p">{</span>
</span><span class=collapse> <span class="n">if</span> <span class="p">(</span><span class="o">!</span><span class="n">newWindows</span><span class="p">[</span><span class="n">wid</span><span class="p">])</span>
</span><span class=collapse> <span class="n">WindowWatcherTrigger</span><span class="p">(</span><span class="s">"Destroyed"</span><span class="p">,</span> <span class="n">wid</span><span class="p">)</span>
</span><span class=collapse> <span class="p">}</span>
</span><span class=collapse>
</span><span class=collapse> <span class="n">windows</span> <span class="o">:=</span> <span class="n">newWindows</span>
</span><span class=collapse><span class="p">}</span>
</span></code></pre></div>
<p>From then on, any time a new window is detected, the <code>OnWindowCreated</code> function is called with the
new window’s <code>hwnd</code> passed as the only argument. In this function, I match this window ID with
various types of windows and take the action I need. Here’s a short preview of that function (in
reality, the function is 81 lines long in my master script).</p>
<div class="hl"><input type=checkbox id=co-10><label for=co-10><span class='btn show-full-code-btn'>Show remaining 9 lines</span></label><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span><span>9
</span><span>10
</span><span>11
</span><span>12
</span><span>13
</span><span>14
</span><span>15
</span><span>16
</span><span>17
</span><span>18
</span><span>19
</span><span>20
</span><span class=collapse>21
</span><span class=collapse>22
</span><span class=collapse>23
</span><span class=collapse>24
</span><span class=collapse>25
</span><span class=collapse>26
</span><span class=collapse>27
</span><span class=collapse>28
</span><span class=collapse>29
</span></pre><pre class=content><code><span><span class="n">OnWindowCreated</span><span class="p">(</span><span class="n">hwnd</span><span class="p">)</span> <span class="p">{</span>
</span><span> <span class="nb">global</span> <span class="n">homedir</span>
</span><span>
</span><span><span class="c1"> ; Close "Illegal IP Address" alerts.</span>
</span><span> <span class="p">}</span> <span class="n">else</span> <span class="n">if</span> <span class="p">(</span><span class="nf">WinExist</span><span class="p">(</span><span class="s">"Application Error ahk_exe jweblauncher.exe ahk_id "</span> <span class="o">.</span> <span class="n">hwnd</span><span class="p">))</span> <span class="p">{</span>
</span><span> <span class="nb">PostMessage</span><span class="p">,</span> <span class="mh">0x112</span><span class="p">,</span> <span class="mh">0xF060</span><span class="p">,</span> <span class="p">,</span> <span class="n">ahk_id</span> <span class="nv">%hwnd%</span>
</span><span>
</span><span><span class="c1"> ; Close "Kyeboard History Utility" alerts.</span>
</span><span> <span class="p">}</span> <span class="n">else</span> <span class="n">if</span> <span class="p">(</span><span class="nf">WinExist</span><span class="p">(</span><span class="s">"Keyboard History Utility ahk_exe WerFault.exe ahk_id "</span> <span class="o">.</span> <span class="n">hwnd</span><span class="p">))</span> <span class="p">{</span>
</span><span> <span class="nb">ControlClick</span><span class="p">,</span> <span class="nf">Close</span> <span class="n">the</span> <span class="n">program</span><span class="p">,</span> <span class="n">ahk_id</span> <span class="nv">%hwnd%</span>
</span><span>
</span><span><span class="c1"> ; When a command window opens, move it to top-right.</span>
</span><span> <span class="p">}</span> <span class="n">else</span> <span class="n">if</span> <span class="p">(</span><span class="nf">WinExist</span><span class="p">(</span><span class="s">"ahk_class ConsoleWindowClass ahk_id "</span> <span class="o">.</span> <span class="n">hwnd</span><span class="p">))</span> <span class="p">{</span>
</span><span> <span class="nb">WinGetPos</span><span class="p">,</span> <span class="p">,</span> <span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="p">,</span> <span class="n">ahk_id</span> <span class="nv">%hwnd%</span>
</span><span> <span class="n">x</span> <span class="o">:=</span> <span class="nv">A_ScreenWidth</span> <span class="o">-</span> <span class="n">w</span>
</span><span> <span class="nb">WinMove</span><span class="p">,</span> <span class="n">ahk_id</span> <span class="nv">%hwnd%</span><span class="p">,</span> <span class="p">,</span> <span class="nv">%x%</span><span class="p">,</span> <span class="mi">0</span>
</span><span>
</span><span><span class="c1"> ; Maximize windows that open unmaximized but occupy almost-entire screen.</span>
</span><span> <span class="p">}</span> <span class="n">else</span> <span class="n">if</span> <span class="p">(</span><span class="nf">WinExist</span><span class="p">(</span><span class="s">"ahk_id "</span> <span class="o">.</span> <span class="n">hwnd</span> <span class="o">.</span> <span class="s">" ahk_group MaximizeOnOpen"</span><span class="p">))</span> <span class="p">{</span>
</span><span> <span class="nb">WinMaximize</span><span class="p">,</span> <span class="n">ahk_id</span> <span class="nv">%hwnd%</span>
</span><span class=collapse>
</span><span class=collapse> <span class="p">}</span> <span class="n">else</span> <span class="p">{</span>
</span><span class=collapse> <span class="nb">WinGetPos</span><span class="p">,</span> <span class="p">,</span> <span class="p">,</span> <span class="n">width</span><span class="p">,</span> <span class="n">height</span><span class="p">,</span> <span class="n">ahk_id</span> <span class="nv">%hwnd%</span>
</span><span class=collapse> <span class="n">if</span> <span class="p">(</span><span class="n">width</span> <span class="o">>=</span> <span class="nv">A_ScreenWidth</span> <span class="o">&&</span> <span class="n">height</span> <span class="o">></span> <span class="o">.</span><span class="mi">9</span> <span class="o">*</span> <span class="nv">A_ScreenHeight</span><span class="p">)</span>
</span><span class=collapse> <span class="nb">WinMaximize</span><span class="p">,</span> <span class="n">ahk_id</span> <span class="nv">%hwnd%</span>
</span><span class=collapse>
</span><span class=collapse> <span class="p">}</span>
</span><span class=collapse>
</span><span class=collapse><span class="p">}</span>
</span></code></pre></div>
<p>There are other methods to achieve the window-watching without polling and I encourage you to try
them out if you’re not comfortable with this polling system, like with using <code>RegisterShellHookWindow</code>. In my
experience, such solutions seemed to miss some windows and were able to catch only a small limited
set of the windows there were opening. So I went with polling, which was less efficient, but has
been more reliable for me.</p>
<h2 id="mess-with-images-in-clipboard">Mess with Images in Clipboard<a class="headerlink" href="#mess-with-images-in-clipboard" title="Permanent link">¶</a></h2>
<p>This is a little trick that’s powered by <a href="https://imagemagick.org/index.php" rel="noopener noreferrer" target="_blank">ImageMagick</a>. I add a menu item in the tray icon’s
context menu called <code>"Add border to image in clipboard"</code>, which is quite self-explanatory!</p>
<div class="hl"><pre class=content><code><span><span class="nb">Menu</span><span class="p">,</span> <span class="n">Tray</span><span class="p">,</span> <span class="n">Add</span><span class="p">,</span> <span class="n">Add</span> <span class="n">border</span> <span class="n">to</span> <span class="n">image</span> <span class="ow">in</span> <span class="nv">clipboard</span><span class="p">,</span> <span class="n">AddBorderToImageInCb</span>
</span></code></pre></div>
<p>The callback for this menu item invokes the following function. Here, we just run the appropriate
ImageMagick command and show a little dialog when it’s done, so we can go ahead and paste the
bordered image.</p>
<div class="hl"><pre class=content><code><span><span class="n">AddBorderToImageInCb</span><span class="p">()</span> <span class="p">{</span>
</span><span> <span class="nb">RunWait</span><span class="p">,</span> <span class="n">C</span><span class="o">:</span>\<span class="n">tools</span>\<span class="n">ImageMagick</span>\<span class="n">magick</span><span class="o">.</span><span class="n">exe</span> <span class="n">convert</span> <span class="nv">clipboard</span><span class="o">:</span><span class="n">myimage</span> <span class="o">-</span><span class="n">bordercolor</span> <span class="s">"#0099FF"</span> <span class="o">-</span><span class="n">border</span> <span class="mi">6</span><span class="n">x6</span> <span class="nv">clipboard</span><span class="o">:</span><span class="p">,</span> <span class="p">,</span> <span class="n">Hide</span>
</span><span> <span class="nb">MsgBox</span><span class="p">,</span> <span class="n">Added</span> <span class="n">border</span> <span class="n">to</span> <span class="n">image</span> <span class="ow">in</span> <span class="nv">clipboard</span><span class="o">.</span>
</span><span><span class="p">}</span>
</span></code></pre></div>
<p>I use this a lot with screenshot snips (taken with the snipping tool or copied from paint), before
pasting into an email. Having a border around images in emails makes them stand out and have a
distinct visual.</p>
<h2 id="periodic-time-display">Periodic Time Display<a class="headerlink" href="#periodic-time-display" title="Permanent link">¶</a></h2>
<p>As an alternative to the popular <a href="https://en.wikipedia.org/wiki/Pomodoro_Technique" rel="noopener noreferrer" target="_blank">Pomodoro Technique</a>, I have a small non-intrusive OSD show up
with the current time at the bottom of my screen every 20 minutes. That is, I get a small blue OSD
at <code>:00</code> times, a small green OSD at <code>:20</code> times and a small orange OSD at <code>:40</code> times. Here’s
preview of how this looks:</p>
<p class="img"><a href="https://sharats.me/static/autohotkey-time-osd.png"><img alt="Time OSD example view" src="https://sharats.me/static/autohotkey-time-osd.png"></a></p>
<p>Again, for this, I have a separate module called <code>time-osd.ahk</code> which I <code>#Include</code> in the master
script and call its init function. (This init-function-in-a-separate-module is something I came up
with that was working well enough, I have no idea if it’s a best practice).</p>
<div class="hl"><input type=checkbox id=co-11><label for=co-11><span class='btn show-full-code-btn'>Show remaining 23 lines</span></label><div class=filename><span>time-osd.ahk</span></div><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span><span>9
</span><span>10
</span><span>11
</span><span>12
</span><span>13
</span><span>14
</span><span>15
</span><span>16
</span><span>17
</span><span>18
</span><span>19
</span><span>20
</span><span class=collapse>21
</span><span class=collapse>22
</span><span class=collapse>23
</span><span class=collapse>24
</span><span class=collapse>25
</span><span class=collapse>26
</span><span class=collapse>27
</span><span class=collapse>28
</span><span class=collapse>29
</span><span class=collapse>30
</span><span class=collapse>31
</span><span class=collapse>32
</span><span class=collapse>33
</span><span class=collapse>34
</span><span class=collapse>35
</span><span class=collapse>36
</span><span class=collapse>37
</span><span class=collapse>38
</span><span class=collapse>39
</span><span class=collapse>40
</span><span class=collapse>41
</span><span class=collapse>42
</span><span class=collapse>43
</span></pre><pre class=content><code><span><span class="n">TimeOSDInit</span><span class="p">()</span> <span class="p">{</span>
</span><span> <span class="nb">global</span> <span class="n">TimeOSDLabel</span>
</span><span> <span class="nb">SetTimer</span><span class="p">,</span> <span class="n">TimeOSDPulse</span><span class="p">,</span> <span class="mi">1000</span>
</span><span> <span class="nb">Gui</span><span class="p">,</span> <span class="n">TimeOSD</span><span class="o">:</span><span class="n">Default</span>
</span><span> <span class="nb">Gui</span><span class="p">,</span> <span class="o">+</span><span class="n">LastFound</span> <span class="o">+</span><span class="n">AlwaysOnTop</span> <span class="o">+</span><span class="n">ToolWindow</span> <span class="o">-</span><span class="n">Caption</span>
</span><span> <span class="nb">Gui</span><span class="p">,</span> <span class="n">Font</span><span class="p">,</span> <span class="n">s18</span><span class="p">,</span> <span class="n">Calibri</span>
</span><span> <span class="nb">Gui</span><span class="p">,</span> <span class="n">Margin</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span>
</span><span> <span class="nb">Gui</span><span class="p">,</span> <span class="n">Add</span><span class="p">,</span> <span class="n">Text</span><span class="p">,</span> <span class="n">cWhite</span> <span class="n">vTimeOSDLabel</span> <span class="n">gTimeOSDClose</span> <span class="n">w250</span> <span class="n">h36</span> <span class="n">Center</span>
</span><span><span class="p">}</span>
</span><span>
</span><span><span class="n">TimeOSDPulse</span><span class="p">()</span> <span class="p">{</span>
</span><span> <span class="nb">static</span> <span class="n">lastTime</span> <span class="o">:=</span> <span class="s">""</span>
</span><span>
</span><span> <span class="n">if</span> <span class="p">(</span><span class="nf">IsFunc</span><span class="p">(</span><span class="s">"IsWindowFullScreen"</span><span class="p">)</span> <span class="o">&&</span> <span class="n">IsWindowFullScreen</span><span class="p">(</span><span class="s">"A"</span><span class="p">))</span>
</span><span> <span class="nb">Return</span>
</span><span>
</span><span> <span class="nb">FormatTime</span><span class="p">,</span> <span class="n">currTime</span><span class="p">,</span> <span class="p">,</span> <span class="n">h</span><span class="o">:</span><span class="n">mm</span> <span class="n">tt</span>
</span><span>
</span><span> <span class="n">if</span> <span class="p">(</span><span class="n">lastTime</span> <span class="o">==</span> <span class="n">currTime</span> <span class="o">||</span> <span class="nv">A_TimeIdlePhysical</span> <span class="o">></span> <span class="mi">600000</span><span class="p">)</span>
</span><span> <span class="nb">Return</span>
</span><span class=collapse>
</span><span class=collapse> <span class="n">if</span> <span class="p">(</span><span class="nf">RegExMatch</span><span class="p">(</span><span class="n">currTime</span><span class="p">,</span> <span class="s">":00"</span><span class="p">))</span>
</span><span class=collapse> <span class="n">TimeOSDShow</span><span class="p">(</span><span class="n">currTime</span><span class="p">,</span> <span class="s">"268BD2"</span><span class="p">)</span>
</span><span class=collapse> <span class="nb">else</span> <span class="n">if</span> <span class="p">(</span><span class="nf">RegExMatch</span><span class="p">(</span><span class="n">currTime</span><span class="p">,</span> <span class="s">":20"</span><span class="p">))</span>
</span><span class=collapse> <span class="n">TimeOSDShow</span><span class="p">(</span><span class="n">currTime</span><span class="p">,</span> <span class="s">"859900"</span><span class="p">)</span>
</span><span class=collapse> <span class="nb">else</span> <span class="n">if</span> <span class="p">(</span><span class="nf">RegExMatch</span><span class="p">(</span><span class="n">currTime</span><span class="p">,</span> <span class="s">":40"</span><span class="p">))</span>
</span><span class=collapse> <span class="n">TimeOSDShow</span><span class="p">(</span><span class="n">currTime</span><span class="p">,</span> <span class="s">"CB4B16"</span><span class="p">)</span>
</span><span class=collapse>
</span><span class=collapse> <span class="n">lastTime</span> <span class="o">:=</span> <span class="n">currTime</span>
</span><span class=collapse><span class="p">}</span>
</span><span class=collapse>
</span><span class=collapse><span class="n">TimeOSDShow</span><span class="p">(</span><span class="n">timeText</span><span class="p">,</span> <span class="n">bg</span><span class="p">)</span> <span class="p">{</span>
</span><span class=collapse> <span class="nb">Gui</span><span class="p">,</span> <span class="n">TimeOSD</span><span class="o">:</span><span class="n">Default</span>
</span><span class=collapse> <span class="nb">Gui</span><span class="p">,</span> <span class="n">Color</span><span class="p">,</span> <span class="nv">%bg%</span>
</span><span class=collapse> <span class="nb">GuiControl</span><span class="p">,</span> <span class="n">Text</span><span class="p">,</span> <span class="n">TimeOSDLabel</span><span class="p">,</span> <span class="n">It</span>'<span class="n">s</span> <span class="nv">%timeText%</span> <span class="n">already</span><span class="o">!</span>
</span><span class=collapse> <span class="n">y</span> <span class="o">:=</span> <span class="nv">A_ScreenHeight</span> <span class="o">-</span> <span class="mi">120</span>
</span><span class=collapse> <span class="nb">Gui</span><span class="p">,</span> <span class="n">Show</span><span class="p">,</span> <span class="n">xCenter</span> <span class="n">y</span><span class="nv">%y%</span> <span class="n">NoActivate</span>
</span><span class=collapse> <span class="nb">SetTimer</span><span class="p">,</span> <span class="n">TimeOSDClose</span><span class="p">,</span> <span class="o">-</span><span class="mi">10000</span>
</span><span class=collapse><span class="p">}</span>
</span><span class=collapse>
</span><span class=collapse><span class="n">TimeOSDClose</span><span class="p">()</span> <span class="p">{</span>
</span><span class=collapse> <span class="nb">Gui</span><span class="p">,</span> <span class="n">TimeOSD</span><span class="o">:</span><span class="n">Cancel</span>
</span><span class=collapse><span class="p">}</span>
</span></code></pre></div>
<p>With this, clicking on the OSDs will close them, or, they’ll disappear in 10 seconds.</p>
<p>To use this, I just include the following in my master script.</p>
<div class="hl"><pre class=content><code><span><span class="nb">#Include</span> <span class="n">time</span><span class="o">-</span><span class="n">osd</span><span class="o">.</span><span class="n">ahk</span>
</span><span><span class="n">TimeOSDInit</span><span class="p">()</span>
</span></code></pre></div>
<h2 id="vim-keys-for-sumatra-pdf">Vim Keys for Sumatra PDF<a class="headerlink" href="#vim-keys-for-sumatra-pdf" title="Permanent link">¶</a></h2>
<p>This one probably only makes sense if your fingers are used to hitting <a href="https://www.vim.org/" rel="noopener noreferrer" target="_blank">Vim</a>’s hotkeys. I wanted
some of Vim’s simple hotkeys for navigating the document on Sumatra PDF (my PDF reader of choice on
Windows). The following snippet that I currently use, gets me <kbd>d</kbd> (like Vim’s
<kbd><C-d></kbd>), <kbd>e</kbd> (like Vim’s <kbd><C-u></kbd>), <kbd>n</kbd>,
<kbd>+n</kbd> (like Vim’s <kbd>N</kbd>), <kbd>x</kbd> (to close a tab), <kbd>g</kbd> and
<kbd>+g</kbd> (like Vim’s <kbd>g</kbd> & <kbd>G</kbd>).</p>
<div class="hl"><input type=checkbox id=co-12><label for=co-12><span class='btn show-full-code-btn'>Show remaining 23 lines</span></label><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span><span>9
</span><span>10
</span><span>11
</span><span>12
</span><span>13
</span><span>14
</span><span>15
</span><span>16
</span><span>17
</span><span>18
</span><span>19
</span><span>20
</span><span class=collapse>21
</span><span class=collapse>22
</span><span class=collapse>23
</span><span class=collapse>24
</span><span class=collapse>25
</span><span class=collapse>26
</span><span class=collapse>27
</span><span class=collapse>28
</span><span class=collapse>29
</span><span class=collapse>30
</span><span class=collapse>31
</span><span class=collapse>32
</span><span class=collapse>33
</span><span class=collapse>34
</span><span class=collapse>35
</span><span class=collapse>36
</span><span class=collapse>37
</span><span class=collapse>38
</span><span class=collapse>39
</span><span class=collapse>40
</span><span class=collapse>41
</span><span class=collapse>42
</span><span class=collapse>43
</span></pre><pre class=content><code><span><span class="nb">#IfWinActive</span> <span class="n">ahk_exe</span> <span class="n">SumatraPDF</span><span class="o">.</span><span class="n">exe</span> <span class="n">ahk_class</span> <span class="n">SUMATRA_PDF_FRAME</span>
</span><span><span class="nl">$d::</span>
</span><span><span class="nl">$e::</span>
</span><span> <span class="n">SumatraKeys</span> <span class="o">:=</span> <span class="p">{</span><span class="n">d</span><span class="o">:</span> <span class="s">"j"</span><span class="p">,</span> <span class="n">e</span><span class="o">:</span> <span class="s">"k"</span><span class="p">}</span>
</span><span> <span class="nb">ControlGetFocus</span><span class="p">,</span> <span class="n">ctrl</span>
</span><span> <span class="n">if</span> <span class="p">(</span><span class="n">ctrl</span> <span class="o">==</span> <span class="s">"Edit1"</span> <span class="ow">or</span> <span class="n">ctrl</span> <span class="o">==</span> <span class="s">"Edit2"</span><span class="p">)</span> <span class="p">{</span>
</span><span> <span class="nb">Send</span> <span class="nv">%A_ThisHotkey%</span>
</span><span> <span class="p">}</span> <span class="n">else</span> <span class="p">{</span>
</span><span> <span class="n">k</span> <span class="o">:=</span> <span class="n">SumatraKeys</span><span class="p">[</span><span class="n">StrReplace</span><span class="p">(</span><span class="nv">A_ThisHotkey</span><span class="p">,</span> <span class="s">"$"</span><span class="p">,</span> <span class="s">""</span><span class="p">)]</span>
</span><span> <span class="nb">Send</span> <span class="p">{</span><span class="nv">%k%</span> <span class="mi">22</span><span class="p">}</span>
</span><span> <span class="p">}</span>
</span><span> <span class="nb">Return</span>
</span><span>
</span><span><span class="nl">$n::</span>
</span><span> <span class="nb">ControlGetFocus</span><span class="p">,</span> <span class="n">ctrl</span>
</span><span> <span class="n">if</span> <span class="p">(</span><span class="n">ctrl</span> <span class="o">==</span> <span class="s">"Edit1"</span> <span class="ow">or</span> <span class="n">ctrl</span> <span class="o">==</span> <span class="s">"Edit2"</span><span class="p">)</span>
</span><span> <span class="nb">Send</span><span class="p">,</span> <span class="n">n</span>
</span><span> <span class="nb">else</span>
</span><span> <span class="nb">Send</span><span class="p">,</span> <span class="p">{</span><span class="n">F3</span><span class="p">}</span>
</span><span> <span class="nb">Return</span>
</span><span class=collapse>
</span><span class=collapse><span class="nl">$+n::</span>
</span><span class=collapse> <span class="nb">ControlGetFocus</span><span class="p">,</span> <span class="n">ctrl</span>
</span><span class=collapse> <span class="n">if</span> <span class="p">(</span><span class="n">ctrl</span> <span class="o">==</span> <span class="s">"Edit1"</span> <span class="ow">or</span> <span class="n">ctrl</span> <span class="o">==</span> <span class="s">"Edit2"</span><span class="p">)</span>
</span><span class=collapse> <span class="nb">Send</span><span class="p">,</span> <span class="n">N</span>
</span><span class=collapse> <span class="nb">else</span>
</span><span class=collapse> <span class="nb">Send</span><span class="p">,</span> <span class="o">+</span><span class="p">{</span><span class="n">F3</span><span class="p">}</span>
</span><span class=collapse> <span class="nb">Return</span>
</span><span class=collapse>
</span><span class=collapse><span class="nl">$x::</span>
</span><span class=collapse> <span class="nb">ControlGetFocus</span><span class="p">,</span> <span class="n">ctrl</span>
</span><span class=collapse> <span class="n">if</span> <span class="p">(</span><span class="n">ctrl</span> <span class="o">==</span> <span class="s">"Edit1"</span> <span class="ow">or</span> <span class="n">ctrl</span> <span class="o">==</span> <span class="s">"Edit2"</span><span class="p">)</span>
</span><span class=collapse> <span class="nb">Send</span><span class="p">,</span> <span class="n">x</span>
</span><span class=collapse> <span class="nb">else</span>
</span><span class=collapse> <span class="nb">Send</span><span class="p">,</span> <span class="o">^</span><span class="n">w</span>
</span><span class=collapse> <span class="nb">Return</span>
</span><span class=collapse>
</span><span class=collapse><span class="nl">+g::</span><span class="n">Send</span><span class="p">,</span> <span class="p">{</span><span class="n">End</span> <span class="mi">2</span><span class="p">}</span>
</span><span class=collapse>
</span><span class=collapse><span class="nb">#IfWinActive</span> <span class="n">Go</span> <span class="n">to</span> <span class="n">page</span> <span class="n">ahk_exe</span> <span class="n">SumatraPDF</span><span class="o">.</span><span class="n">exe</span> <span class="n">ahk_class</span> <span class="n">#32770</span>
</span><span class=collapse><span class="nl">g::</span><span class="n">Send</span><span class="p">,</span> <span class="p">{</span><span class="n">Escape</span><span class="p">}{</span><span class="n">Home</span><span class="p">}</span>
</span><span class=collapse>
</span><span class=collapse><span class="nb">#IfWinActive</span>
</span></code></pre></div>
<p>This looks like a sad, long, hairy piece of code (probably because it is), but it works, so I let it
be. This sentiment shows up a lot when dealing with AutoHotkey code. But it works, and it works
really well.</p>
<h2 id="conclusion">Conclusion<a class="headerlink" href="#conclusion" title="Permanent link">¶</a></h2>
<p>AutoHotkey’s language may have its quirks, but it’s a very powerful tool when it comes to hotkeys.
I have come to the point that working on Windows is practically hair-wrecking for me without
AutoHotkey (and my scripts, of course). I encourage you to check it out and explore the
possibilities.</p>
<p class="note">You can read the <a href="../the-magic-of-autohotkey-2/">Part 2</a> of this now!</p>Guide to Comprehensions in Python2020-02-23T00:00:00+05:302020-02-23T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2020-02-23:/posts/guide-to-comprehensions-in-python/<p>Comprehensions are a syntax construct used for applying some form of transformations and filtering
over streams of data. The problems comprehensions solve can be done without them, using plain old
<code>for</code>-loops, but where possible, comprehensions can improve readability and show the intent very
well.</p>
<p>This article assumes some familiarity …</p><p>Comprehensions are a syntax construct used for applying some form of transformations and filtering
over streams of data. The problems comprehensions solve can be done without them, using plain old
<code>for</code>-loops, but where possible, comprehensions can improve readability and show the intent very
well.</p>
<p>This article assumes some familiarity with Python (and comprehensions as well). I will go over the
basics of comprehensions quickly and jump into the meat of the article. Most of this article applies
for Python 3, unless otherwise specified.</p>
<p class="note">If you’re here for the live converter or comprehension ⇔ <code>for</code>-loop code, <a href="#live-code-converter">it’s further down in
the page</a>.</p>
<div class="toc"><span class="toctitle">Table of Contents</span><ul>
<li><a href="#basic-syntax">Basic Syntax</a></li>
<li><a href="#different-collectors">Different Collectors</a></li>
<li><a href="#multiple-looping-constructs">Multiple Looping Constructs</a><ul>
<li><a href="#zipping-instead-of-cross-product">Zipping instead of Cross Product</a></li>
</ul>
</li>
<li><a href="#rewriting-comprehensions-map-filter-builtins">Rewriting Comprehensions map & filter Builtins</a></li>
<li><a href="#reducing-with-assignment-expressions">Reducing with Assignment Expressions</a></li>
<li><a href="#set-operations-with-comprehensions">Set Operations with Comprehensions</a></li>
<li><a href="#generator-expressions">Generator Expressions</a></li>
<li><a href="#the-key-argument-for-sorted">The key Argument for sorted</a></li>
<li><a href="#no-side-effects-please">No Side Effects Please</a></li>
<li><a href="#looking-inside">Looking Inside</a></li>
<li><a href="#live-code-converter">Live Code Converter</a></li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
</div>
<h2 id="basic-syntax">Basic Syntax<a class="headerlink" href="#basic-syntax" title="Permanent link">¶</a></h2>
<p>Let’s go over the basic syntax for starters. It can be divided into three parts. The result
expression, the looping construct(s) and the filter expression. Of these, the filter expression is
optional, but the other two are required. Let’s look at a simple example to get an idea:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="p">[</span><span class="n">n</span> <span class="o">**</span> <span class="mi">2</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">)]</span>
</span><span><span class="go">[0, 1, 4, 9]</span>
</span></code></pre></div>
<p>This is a <em>list comprehension</em> with no filtering (<em>i.e.,</em> no <code>if</code> clause). Here, the <code>n ** 2</code> part
is the result expression and the <code>for n in range(4)</code> is the looping construct. This comprehension
expression is the same as the following piece of code, written without comprehensions:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">squares</span> <span class="o">=</span> <span class="p">[]</span>
</span><span><span class="gp">>>> </span><span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">):</span>
</span><span><span class="gp">... </span> <span class="n">squares</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">n</span> <span class="o">**</span> <span class="mi">2</span><span class="p">)</span>
</span><span><span class="gp">...</span>
</span><span><span class="gp">>>> </span><span class="n">squares</span>
</span><span><span class="go">[0, 1, 4, 9]</span>
</span></code></pre></div>
<p>Comprehensions also support conditions on the looping variables. For instance, in the example above,
if we only wanted squares of even numbers, we could do:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="p">[</span><span class="n">n</span> <span class="o">**</span> <span class="mi">2</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">)</span> <span class="k">if</span> <span class="n">n</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span><span class="p">]</span>
</span><span><span class="go">[0, 4]</span>
</span></code></pre></div>
<p>In this case, the result expression is not evaluated when the <code>n % 2 == 0</code> turns out to be <code>False</code>.</p>
<p class="note">The keen Pythonista might note that this can be accomplished more simply by using the <code>step</code>
argument of the <code>range</code> builtin, but please excuse me for lacking in creativity for the examples!</p>
<h2 id="different-collectors">Different Collectors<a class="headerlink" href="#different-collectors" title="Permanent link">¶</a></h2>
<p>In addition to <code>list</code> comprehensions, Python supports <code>set</code> and <code>dict</code> comprehensions as well. Where
<code>list</code> comprehensions collect the result values in a <code>list</code>, the latter two collect them in <code>set</code>s
and <code>dict</code>s respectively.</p>
<p>The syntax is almost exactly same as that of the list comprehensions. The only difference is that we
use braces for set and dict comprehensions, where we use square brackets for list comprehensions.
The looping and filtering constructs behave the same way. The result expression behaves the same way
for set comprehensions, but for dict comprehensions, we have to provide two expressions, the key and
the value, separate by a colon. Let’s look at some examples:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="p">[</span><span class="n">color</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span> <span class="k">for</span> <span class="n">color</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'Blue'</span><span class="p">,</span> <span class="s1">'Red'</span><span class="p">,</span> <span class="s1">'blue'</span><span class="p">,</span> <span class="s1">'yellow'</span><span class="p">]]</span>
</span><span><span class="go">['blue', 'red', 'blue', 'yellow']</span>
</span><span>
</span><span><span class="gp">>>> </span><span class="p">{</span><span class="n">color</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span> <span class="k">for</span> <span class="n">color</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'Blue'</span><span class="p">,</span> <span class="s1">'Red'</span><span class="p">,</span> <span class="s1">'blue'</span><span class="p">,</span> <span class="s1">'yellow'</span><span class="p">]}</span>
</span><span><span class="go">{'blue', 'red', 'yellow'}</span>
</span></code></pre></div>
<p>The first expression in the above REPL session is a list comprehension and the second is a set
comprehension. Notice that the only difference in the first and third lines is the surrounding
bracket type.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="p">{</span><span class="n">color</span><span class="o">.</span><span class="n">lower</span><span class="p">():</span> <span class="nb">len</span><span class="p">(</span><span class="n">color</span><span class="p">)</span> <span class="k">for</span> <span class="n">color</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'Blue'</span><span class="p">,</span> <span class="s1">'Red'</span><span class="p">,</span> <span class="s1">'blue'</span><span class="p">,</span> <span class="s1">'yellow'</span><span class="p">]}</span>
</span><span><span class="go">{'blue': 4, 'red': 3, 'yellow': 6}</span>
</span></code></pre></div>
<p>This is a dictionary comprehension. Notice here, the result expression is a <em>key-value pair of
expressions</em>, as opposed to a single expression for list and set comprehensions.</p>
<p>Note that these two forms of comprehensions have been introduced in Python 2.7 & 3. In the previous
versions, we could replicate this by calling the <code>set</code> and <code>dict</code> builtins over list comprehensions.
Here’s an example:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="nb">set</span><span class="p">([</span><span class="n">color</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span> <span class="k">for</span> <span class="n">color</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'Blue'</span><span class="p">,</span> <span class="s1">'Red'</span><span class="p">,</span> <span class="s1">'blue'</span><span class="p">,</span> <span class="s1">'yellow'</span><span class="p">]])</span>
</span><span><span class="go">{'blue', 'red', 'yellow'}</span>
</span><span>
</span><span><span class="gp">>>> </span><span class="nb">dict</span><span class="p">([(</span><span class="n">color</span><span class="o">.</span><span class="n">lower</span><span class="p">(),</span> <span class="nb">len</span><span class="p">(</span><span class="n">color</span><span class="p">))</span> <span class="k">for</span> <span class="n">color</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'Blue'</span><span class="p">,</span> <span class="s1">'Red'</span><span class="p">,</span> <span class="s1">'blue'</span><span class="p">,</span> <span class="s1">'yellow'</span><span class="p">]])</span>
</span><span><span class="go">{'blue': 4, 'red': 3, 'yellow': 6}</span>
</span></code></pre></div>
<p>For dictionaries, we create a list of 2-tuples (key-value pairs) and pass that to <code>dict</code>.</p>
<h2 id="multiple-looping-constructs">Multiple Looping Constructs<a class="headerlink" href="#multiple-looping-constructs" title="Permanent link">¶</a></h2>
<p>In the previous examples, we’ve only used one looping construct. However, it is possible to use
more than one looping construct. This works very similar to a nested <code>for</code>-loop. Let’s look at an
example:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="p">[(</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">13</span><span class="p">)]</span>
</span><span><span class="go">[(0, 10), (0, 11), (0, 12), (1, 10), (1, 11), (1, 12), (2, 10), (2, 11), (2, 12)]</span>
</span></code></pre></div>
<p>This output is easy to visualize if you see the two <code>for</code>-loops nested. The following is a
reproduction of the above, without comprehensions:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">result</span> <span class="o">=</span> <span class="p">[]</span>
</span><span><span class="gp">>>> </span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">):</span>
</span><span><span class="gp">... </span> <span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">13</span><span class="p">):</span>
</span><span><span class="gp">... </span> <span class="n">result</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">))</span>
</span><span><span class="gp">...</span>
</span><span><span class="gp">>>> </span><span class="n">result</span>
</span><span><span class="go">[(0, 10), (0, 11), (0, 12), (1, 10), (1, 11), (1, 12), (2, 10), (2, 11), (2, 12)]</span>
</span></code></pre></div>
<p>This can go further levels of nesting, although if you have comprehensions with more three levels of
nesting, you should probably rethink your data structures or the way you’re working with them.</p>
<p>Multiple looping constructs work just fine for set and dict comprehensions as well. Here’s some
examples with set comprehensions and using a condition expression as well:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="p">{(</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">13</span><span class="p">)}</span>
</span><span><span class="go">{(1, 12), (2, 11), (0, 12), (2, 10), (0, 11), (0, 10), (2, 12), (1, 10), (1, 11)}</span>
</span><span>
</span><span><span class="gp">>>> </span><span class="p">{(</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">13</span><span class="p">)</span> <span class="k">if</span> <span class="n">j</span> <span class="o">-</span> <span class="n">i</span> <span class="o">></span> <span class="mi">10</span><span class="p">}</span>
</span><span><span class="go">{(0, 11), (1, 12), (0, 12)}</span>
</span></code></pre></div>
<p>A subtle point here that’s not easy to notice in the comprehensions is that the <code>range(10, 13)</code> call
in the above examples is called <em>three</em> times, whereas the <code>range(0, 3)</code> is called <em>once</em>. This
becomes obvious if you visualize this as the nested <code>for</code>-loop illustrated above. This is important
when using generators or iterators that work single-pass, like <code>map</code> objects, or file objects (for
which, we’ll need <code>.seek</code>). Check out the following example to see what I mean:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">range_for_i</span> <span class="o">=</span> <span class="nb">map</span><span class="p">(</span><span class="nb">str</span><span class="p">,</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">))</span>
</span><span><span class="gp">>>> </span><span class="n">range_for_j</span> <span class="o">=</span> <span class="nb">map</span><span class="p">(</span><span class="nb">str</span><span class="p">,</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">13</span><span class="p">))</span>
</span><span>
</span><span><span class="gp">>>> </span><span class="p">[(</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">range_for_i</span> <span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="n">range_for_j</span><span class="p">]</span>
</span><span><span class="go">[('0', '10'), ('0', '11'), ('0', '12')]</span>
</span></code></pre></div>
<p>In this example, the <code>map</code> objects are destroyed once they have yielded all their results. That is
why the <code>range_for_j</code> only produced the three numbers only once, which were enough to pair with just
<code>'0'</code>, and there’s no more to be paired with <code>'1'</code> and <code>'2'</code>.</p>
<p>You’re not likely to encounter this in real-world code, but it’s good to know lest we end up facing
it.</p>
<h3 id="zipping-instead-of-cross-product">Zipping instead of Cross Product<a class="headerlink" href="#zipping-instead-of-cross-product" title="Permanent link">¶</a></h3>
<p>Using multiple <code>for</code> loops like above creates a sort-of cross-product. This is by nature of the
nested loop structure. But what if we’re looking for a sort-of dot-product like result? Python
provides the <code>zip</code> builtin for this purpose. It is so specific to this problem, that using a
comprehension looks like unnecessary ceremony:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="p">[(</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">13</span><span class="p">))]</span>
</span><span><span class="go">[(0, 10), (1, 11), (2, 12)]</span>
</span><span>
</span><span><span class="gp">>>> </span><span class="nb">list</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">13</span><span class="p">)))</span>
</span><span><span class="go">[(0, 10), (1, 11), (2, 12)]</span>
</span></code></pre></div>
<p>Of course, if we’re doing some operation with <code>i</code> and <code>j</code> instead of just creating tuples, the
comprehension would still be very useful.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="p">[</span><span class="n">i</span> <span class="o">*</span> <span class="n">j</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">13</span><span class="p">))]</span>
</span><span><span class="go">[0, 11, 24]</span>
</span></code></pre></div>
<h2 id="rewriting-comprehensions-map-filter-builtins">Rewriting Comprehensions <code>map</code> & <code>filter</code> Builtins<a class="headerlink" href="#rewriting-comprehensions-map-filter-builtins" title="Permanent link">¶</a></h2>
<p>Comprehensions can usually be a more-readable alternative to code written using <code>map</code> and/or
<code>filter</code> functions.</p>
<p>I’ve discussed the <code>map</code> builtin in more detail in <a href="../python-map-function/">a previous article</a>. Not all
features of a comprehension can be translated with just the <code>map</code> function. In particular, there’s
no way to apply a condition like we can in comprehensions, when using the <code>map</code> function alone. It
can be done if we also make use of the <code>filter</code> builtin. Here’s an example of how such a
comprehension can be rewritten with <code>map</code> and <code>filter</code>.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="p">[</span><span class="n">n</span> <span class="o">**</span> <span class="mi">2</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span> <span class="k">if</span> <span class="n">n</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span><span class="p">]</span>
</span><span><span class="go">[0, 4, 16, 36, 64]</span>
</span><span>
</span><span><span class="gp">>>> </span><span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">n</span><span class="p">:</span> <span class="n">n</span> <span class="o">**</span> <span class="mi">2</span><span class="p">,</span> <span class="nb">filter</span><span class="p">(</span><span class="k">lambda</span> <span class="n">n</span><span class="p">:</span> <span class="n">n</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span><span class="p">,</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">))))</span>
</span><span><span class="go">[0, 4, 16, 36, 64]</span>
</span></code></pre></div>
<p>Obviously, the comprehension reads much better, but I’d urge you to not just throw away the <code>map</code>
and <code>filter</code> builtins. They have their place and sometimes, code using them can read much better
than comprehensions. Check out my <a href="../python-map-function/">article on <code>map</code> function</a> for such examples and
other rationales.</p>
<h2 id="reducing-with-assignment-expressions">Reducing with Assignment Expressions<a class="headerlink" href="#reducing-with-assignment-expressions" title="Permanent link">¶</a></h2>
<p>I’ve actually stumbled on a version of this idea on Reddit. Unfortunately I don’t have the source,
so, wherever you are, thank you!</p>
<p>The <code>functools</code> module from the standard library provides the <a href="https://docs.python.org/3/library/functools.html#functools.reduce" rel="noopener noreferrer" target="_blank"><code>reduce</code></a> callable
which can be used to systematically aggregate values in collections. I won’t go into details of how
this can be used, but I will show how such an affect can be reproduced with comprehensions.</p>
<p>Let’s look at an example of using the <code>functools.reduce</code>:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">functools</span>
</span><span><span class="gp">>>> </span><span class="n">functools</span><span class="o">.</span><span class="n">reduce</span><span class="p">(</span><span class="k">lambda</span> <span class="n">acc</span><span class="p">,</span> <span class="n">item</span><span class="p">:</span> <span class="n">acc</span> <span class="o">*</span> <span class="n">item</span><span class="p">,</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">),</span> <span class="mi">1</span><span class="p">)</span>
</span><span><span class="go">24</span>
</span></code></pre></div>
<p>A simple implementation of the <code>reduce</code> function is provided at the official documentation and it’s
a better explanation that I can provide here. Instead, we’ll try and reproduce this with
comprehensions.</p>
<p>For this, we have to first familiarize ourselves with the <a href="https://docs.python.org/3/faq/design.html#why-can-t-i-use-an-assignment-in-an-expression" rel="noopener noreferrer" target="_blank">walrus operator</a>. This is a new feature
in Python 3.8, that lets us do assignments in expressions. This means we’ll now be able to do
assignment operations in places where only expressions (and not statements) are allowed, like the
result expression spot in comprehensions.</p>
<p><em>By the power of the gray walrus</em>, we can reproduce <code>functools.reduce</code>:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">acc</span> <span class="o">=</span> <span class="mi">1</span>
</span><span><span class="gp">>>> </span><span class="p">[</span><span class="n">acc</span> <span class="o">:=</span> <span class="n">acc</span> <span class="o">*</span> <span class="n">item</span> <span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">)]</span>
</span><span><span class="go">[1, 2, 6, 24]</span>
</span><span><span class="gp">>>> </span><span class="n">_</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
</span><span><span class="go">24</span>
</span></code></pre></div>
<p>Although that works, and is quite nice, I’m not sure how readable that is. But I can attribute my
discomfort to the fact that this is uses a new language feature and like anything in life, needs
some getting used to. Also since it’s new in version 3.8, it’s probably best to stay away from it in
production code for a little while.</p>
<h2 id="set-operations-with-comprehensions">Set Operations with Comprehensions<a class="headerlink" href="#set-operations-with-comprehensions" title="Permanent link">¶</a></h2>
<p>Comprehensions lend themselves quite well for set operations like intersection and difference.
They’ll probably be less performant (and even less obvious to readers of such code), but
nonetheless, it’s a nice example to play with:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">rgb_colors</span> <span class="o">=</span> <span class="p">{</span><span class="s2">"red"</span><span class="p">,</span> <span class="s2">"green"</span><span class="p">,</span> <span class="s2">"blue"</span><span class="p">}</span>
</span><span><span class="gp">>>> </span><span class="n">ryb_colors</span> <span class="o">=</span> <span class="p">{</span><span class="s2">"red"</span><span class="p">,</span> <span class="s2">"yellow"</span><span class="p">,</span> <span class="s2">"blue"</span><span class="p">}</span>
</span><span>
</span><span><span class="gp">>>> </span><span class="n">intersection</span> <span class="o">=</span> <span class="p">{</span><span class="n">c</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">rgb_colors</span> <span class="k">if</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">ryb_colors</span><span class="p">}</span>
</span><span><span class="gp">>>> </span><span class="n">intersection</span>
</span><span><span class="go">{'red', 'blue'}</span>
</span><span>
</span><span><span class="gp">>>> </span><span class="n">difference</span> <span class="o">=</span> <span class="p">{</span><span class="n">c</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">rgb_colors</span> <span class="k">if</span> <span class="n">c</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">ryb_colors</span><span class="p">}</span>
</span><span><span class="gp">>>> </span><span class="n">difference</span>
</span><span><span class="go">{'green'}</span>
</span></code></pre></div>
<p>These are the same results we’d get if we used the standard set operators / methods:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">rgb_colors</span> <span class="o">&</span> <span class="n">ryb_colors</span>
</span><span><span class="go">{'red', 'blue'}</span>
</span><span>
</span><span><span class="gp">>>> </span><span class="n">rgb_colors</span> <span class="o">-</span> <span class="n">ryb_colors</span>
</span><span><span class="go">{'green'}</span>
</span></code></pre></div>
<p>Again, use the standard set functionalities for this, not the comprehension based methods I
illustrated above. If you do use the comprehension method of doing this in production, don’t point
to me or this article as inspiration.</p>
<h2 id="generator-expressions">Generator Expressions<a class="headerlink" href="#generator-expressions" title="Permanent link">¶</a></h2>
<p>When comprehensions are wrapped in square brackets or braces, the result is a fully realized
collection, like a list or a set. However, when not wrapped as such, or when wrapped with just
parentheses, the result is a generator expression, with none of result items realized. The result
items are realized as needed, like for example, if it’s used in a <code>for</code>-loop.</p>
<p>Consider the following example session:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="p">[</span><span class="n">n</span> <span class="o">**</span> <span class="mi">2</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">)]</span>
</span><span><span class="go">[0, 1, 4, 9]</span>
</span><span>
</span><span><span class="gp">>>> </span><span class="n">n</span> <span class="o">**</span> <span class="mi">2</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">)</span>
</span><span><span class="go"><generator object <genexpr> at 0x0000000005768DC8></span>
</span></code></pre></div>
<p>We can use this generator object in a <code>for</code>-loop or, perhaps more typically, in an aggregation
function, like <code>sum</code> or <code>max</code> etc.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">squares</span> <span class="o">=</span> <span class="n">n</span> <span class="o">**</span> <span class="mi">2</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">)</span>
</span><span><span class="gp">>>> </span><span class="nb">sum</span><span class="p">(</span><span class="n">squares</span><span class="p">)</span>
</span><span><span class="go">14</span>
</span></code></pre></div>
<p>Of course since this is a generator expression, it can be iterated over <em>only once</em>. If you want to
iterate over it multiple times, just turn it into a list.</p>
<p>Generator expressions were introduced in <a href="https://www.python.org/dev/peps/pep-0289/" rel="noopener noreferrer" target="_blank">PEP-289</a>, which
contains a lot of examples. I recommend reviewing it for some cool use cases, which I won’t
reproduce here.</p>
<p>One small note regarding passing generator expressions as an argument to functions is that, make it
a best practice to always wrap them with parentheses. The reason is, when using a generator
expression as an argument to a function, and when it is not the <em>only</em> argument to the function, we
may get an error that the generator expression is not parenthesized. Check out the following example
if that doesn’t make sense:</p>
<p>In the following call to <code>sorted</code>, we pass in a generator expression as the sole argument, and we
get the expected result.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="nb">sorted</span><span class="p">(</span><span class="n">word</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span> <span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="s2">"We are from planet Earth, what's up?"</span><span class="o">.</span><span class="n">split</span><span class="p">())</span>
</span><span><span class="go">['are', 'earth,', 'from', 'planet', 'up?', 'we', "what's"]</span>
</span></code></pre></div>
<p>Now to the same call, we add the <code>key</code> argument hoping to sort by the string lengths. Instead, we
get a <code>SyntaxError</code> because our generator expression is not parenthesized.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="nb">sorted</span><span class="p">(</span><span class="n">word</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span> <span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="s2">"We are from planet Earth, what's up?"</span><span class="o">.</span><span class="n">split</span><span class="p">(),</span> <span class="n">key</span><span class="o">=</span><span class="nb">len</span><span class="p">)</span>
</span><span> File <span class="nb">"<stdin>"</span>, line <span class="m">1</span>
</span><span><span class="gr">SyntaxError</span>: <span class="n">Generator expression must be parenthesized</span>
</span></code></pre></div>
<p>So, if we add parentheses to the generator, it works fine and we get the expected result.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="nb">sorted</span><span class="p">((</span><span class="n">word</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span> <span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="s2">"We are from planet Earth, what's up?"</span><span class="o">.</span><span class="n">split</span><span class="p">()),</span> <span class="n">key</span><span class="o">=</span><span class="nb">len</span><span class="p">)</span>
</span><span><span class="go">['we', 'are', 'up?', 'from', 'planet', 'earth,', "what's"]</span>
</span></code></pre></div>
<h2 id="the-key-argument-for-sorted">The <code>key</code> Argument for <code>sorted</code><a class="headerlink" href="#the-key-argument-for-sorted" title="Permanent link">¶</a></h2>
<p>The <code>sorted</code> builtin provides the <code>key</code> argument that can be set to a function. This function is
applied to each item in the given list and the list items are sorted according to the sorting order
of the results of these function calls. This is a very convenient feature of <code>sorted</code>.</p>
<p>While this is probably a horrible thing to do, we could use comprehensions to recreate this effect
without using the <code>key</code> argument. The idea is that we first create a sequence of 2-tuples, where the
first items are the results of the <code>key</code> function and the second items are the original list items.
We then sort this sequence of tuples, extract the second items in each tuple and return that. Here’s
an example implementation doing just that:</p>
<div class="hl"><pre class=content><code><span><span class="k">def</span> <span class="nf">sad_sorted_with_key</span><span class="p">(</span><span class="n">items</span><span class="p">,</span> <span class="n">key_fn</span><span class="p">):</span>
</span><span> <span class="k">return</span> <span class="p">[</span><span class="n">item</span> <span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">item</span> <span class="ow">in</span> <span class="nb">sorted</span><span class="p">((</span><span class="n">key_fn</span><span class="p">(</span><span class="n">item</span><span class="p">),</span> <span class="n">item</span><span class="p">)</span> <span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">items</span><span class="p">)]</span>
</span><span>
</span><span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="n">sad_sorted_with_key</span><span class="p">(</span>
</span><span> <span class="p">(</span><span class="n">word</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span> <span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="s2">"We are from planet Earth, what's up?"</span><span class="o">.</span><span class="n">split</span><span class="p">()),</span>
</span><span> <span class="nb">len</span><span class="p">,</span>
</span><span><span class="p">))</span>
</span></code></pre></div>
<p>This script would produce the following output:</p>
<div class="hl"><pre class=content><code><span>['we', 'are', 'up?', 'from', 'earth,', 'planet', "what's"]
</span></code></pre></div>
<p>As usual, don’t do this in production. This is just a sad experiment.</p>
<h2 id="no-side-effects-please">No Side Effects Please<a class="headerlink" href="#no-side-effects-please" title="Permanent link">¶</a></h2>
<p>As best practice, please strive to have no side effects in your comprehension result expressions.
Check out the following example to see what I mean:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="p">[</span><span class="nb">print</span><span class="p">(</span><span class="n">n</span> <span class="o">**</span> <span class="mi">2</span><span class="p">)</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">)]</span>
</span><span><span class="go">0</span>
</span><span><span class="go">1</span>
</span><span><span class="go">4</span>
</span><span><span class="go">9</span>
</span><span><span class="go">[None, None, None, None]</span>
</span></code></pre></div>
<p>While this solves the purpose of printing the squares one per line, it also builds a list of
<code>None</code>s. It’s also counter-intuitive when we treat comprehensions as applying a <em>transformation</em>
over each item in a collection. Calling <code>print</code> is not a transformation, it’s a side effect.</p>
<p>For use cases like this, it’s best to use a traditional <code>for</code>-loop:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">):</span>
</span><span><span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="n">n</span> <span class="o">**</span> <span class="mi">2</span><span class="p">)</span>
</span><span><span class="go">0</span>
</span><span><span class="go">1</span>
</span><span><span class="go">4</span>
</span><span><span class="go">9</span>
</span></code></pre></div>
<p>The intent here is clearer, which is to print each square, not to make a list of some results.</p>
<h2 id="looking-inside">Looking Inside<a class="headerlink" href="#looking-inside" title="Permanent link">¶</a></h2>
<p>As another likely-pointless exercise, let’s look at these comprehensions as Python bytecode, and
compare it with the same solution written using traditional <code>for</code>-loop.</p>
<p>First, let’s define two functions that solve the same problem, but one uses comprehensions, and the
other doesn’t.</p>
<div class="hl"><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span><span>9
</span></pre><pre class=content><code><span><span class="k">def</span> <span class="nf">loop_squares</span><span class="p">():</span>
</span><span> <span class="n">result</span> <span class="o">=</span> <span class="p">[]</span>
</span><span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">):</span>
</span><span> <span class="n">result</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">n</span> <span class="o">**</span> <span class="mi">2</span><span class="p">)</span>
</span><span> <span class="k">return</span> <span class="n">result</span>
</span><span>
</span><span>
</span><span><span class="k">def</span> <span class="nf">comp_squares</span><span class="p">():</span>
</span><span> <span class="k">return</span> <span class="p">[</span><span class="n">n</span> <span class="o">**</span> <span class="mi">2</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">)]</span>
</span></code></pre></div>
<p>Let’s make sure they produce the same output:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">loop_squares</span><span class="p">()</span>
</span><span><span class="go">[0, 1, 4, 9]</span>
</span><span><span class="gp">>>> </span><span class="n">comp_squares</span><span class="p">()</span>
</span><span><span class="go">[0, 1, 4, 9]</span>
</span></code></pre></div>
<p>Now let’s get the <a href="https://docs.python.org/3/library/dis.html" rel="noopener noreferrer" target="_blank"><code>dis</code></a> module and disassemble both of these functions:</p>
<div class="hl"><input type=checkbox id=co-13><label for=co-13><span class='btn show-full-code-btn'>Show remaining 28 lines</span></label><pre class=content><code><span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">dis</span>
</span><span><span class="gp">>>> </span><span class="n">dis</span><span class="o">.</span><span class="n">dis</span><span class="p">(</span><span class="n">loop_squares</span><span class="p">)</span>
</span><span><span class="go"> 2 0 BUILD_LIST 0</span>
</span><span><span class="go"> 2 STORE_FAST 0 (result)</span>
</span><span>
</span><span><span class="go"> 3 4 SETUP_LOOP 30 (to 36)</span>
</span><span><span class="go"> 6 LOAD_GLOBAL 0 (range)</span>
</span><span><span class="go"> 8 LOAD_CONST 1 (4)</span>
</span><span><span class="go"> 10 CALL_FUNCTION 1</span>
</span><span><span class="go"> 12 GET_ITER</span>
</span><span><span class="go"> >> 14 FOR_ITER 18 (to 34)</span>
</span><span><span class="go"> 16 STORE_FAST 1 (n)</span>
</span><span>
</span><span><span class="go"> 4 18 LOAD_FAST 0 (result)</span>
</span><span><span class="go"> 20 LOAD_METHOD 1 (append)</span>
</span><span><span class="go"> 22 LOAD_FAST 1 (n)</span>
</span><span><span class="go"> 24 LOAD_CONST 2 (2)</span>
</span><span><span class="go"> 26 BINARY_POWER</span>
</span><span><span class="go"> 28 CALL_METHOD 1</span>
</span><span><span class="go"> 30 POP_TOP</span>
</span><span class=collapse><span class="go"> 32 JUMP_ABSOLUTE 14</span>
</span><span class=collapse><span class="go"> >> 34 POP_BLOCK</span>
</span><span class=collapse>
</span><span class=collapse><span class="go"> 5 >> 36 LOAD_FAST 0 (result)</span>
</span><span class=collapse><span class="go"> 38 RETURN_VALUE</span>
</span><span class=collapse>
</span><span class=collapse><span class="gp">>>> </span><span class="n">dis</span><span class="o">.</span><span class="n">dis</span><span class="p">(</span><span class="n">comp_squares</span><span class="p">)</span>
</span><span class=collapse><span class="go"> 2 0 LOAD_CONST 1 (<code object <listcomp> at 0x7f3958a76c00, file "<stdin>", line 2>)</span>
</span><span class=collapse><span class="go"> 2 LOAD_CONST 2 ('comp_squares.<locals>.<listcomp>')</span>
</span><span class=collapse><span class="go"> 4 MAKE_FUNCTION 0</span>
</span><span class=collapse><span class="go"> 6 LOAD_GLOBAL 0 (range)</span>
</span><span class=collapse><span class="go"> 8 LOAD_CONST 3 (4)</span>
</span><span class=collapse><span class="go"> 10 CALL_FUNCTION 1</span>
</span><span class=collapse><span class="go"> 12 GET_ITER</span>
</span><span class=collapse><span class="go"> 14 CALL_FUNCTION 1</span>
</span><span class=collapse><span class="go"> 16 RETURN_VALUE</span>
</span><span class=collapse>
</span><span class=collapse><span class="go">Disassembly of <code object <listcomp> at 0x7f3958a76c00, file "<stdin>", line 2>:</span>
</span><span class=collapse><span class="go"> 2 0 BUILD_LIST 0</span>
</span><span class=collapse><span class="go"> 2 LOAD_FAST 0 (.0)</span>
</span><span class=collapse><span class="go"> >> 4 FOR_ITER 12 (to 18)</span>
</span><span class=collapse><span class="go"> 6 STORE_FAST 1 (n)</span>
</span><span class=collapse><span class="go"> 8 LOAD_FAST 1 (n)</span>
</span><span class=collapse><span class="go"> 10 LOAD_CONST 0 (2)</span>
</span><span class=collapse><span class="go"> 12 BINARY_POWER</span>
</span><span class=collapse><span class="go"> 14 LIST_APPEND 2</span>
</span><span class=collapse><span class="go"> 16 JUMP_ABSOLUTE 4</span>
</span><span class=collapse><span class="go"> >> 18 RETURN_VALUE</span>
</span></code></pre></div>
<p>I won’t discuss each instruction in the above outputs, check out the official documentation of the
<a href="https://docs.python.org/3/library/dis.html" rel="noopener noreferrer" target="_blank"><code>dis</code></a> module for that. But just skimming over the above, we can see one striking difference.
The comprehension function seems to have created a <code>code</code> object, which is doing the work of the
comprehension and passing (<em>returning</em>) the result to our <code>comp_squares</code> function. That sounds like
the <code>comp_squares</code> function is using an extra layer in the stack frame. We can confirm this by
changing the functions to the following:</p>
<div class="hl"><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span><span>9
</span><span>10
</span><span>11
</span><span>12
</span></pre><pre class=content><code><span><span class="kn">import</span> <span class="nn">traceback</span>
</span><span>
</span><span><span class="k">def</span> <span class="nf">loop_squares</span><span class="p">():</span>
</span><span> <span class="n">traceback</span><span class="o">.</span><span class="n">print_stack</span><span class="p">()</span>
</span><span> <span class="n">result</span> <span class="o">=</span> <span class="p">[]</span>
</span><span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">):</span>
</span><span> <span class="n">result</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">n</span> <span class="o">**</span> <span class="mi">2</span><span class="p">)</span>
</span><span> <span class="k">return</span> <span class="n">result</span>
</span><span>
</span><span>
</span><span><span class="k">def</span> <span class="nf">comp_squares</span><span class="p">():</span>
</span><span> <span class="k">return</span> <span class="p">[[</span><span class="n">traceback</span><span class="o">.</span><span class="n">print_stack</span><span class="p">()</span> <span class="k">if</span> <span class="n">n</span> <span class="o">==</span> <span class="mi">0</span> <span class="k">else</span> <span class="kc">None</span><span class="p">,</span> <span class="n">n</span> <span class="o">**</span> <span class="mi">2</span><span class="p">][</span><span class="mi">1</span><span class="p">]</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">)]</span>
</span></code></pre></div>
<p>Let’s see the stack they print and make sure they still produce the same result:</p>
<div class="hl"><pre class=content><code><span>>>> loop_squares()
</span><span> File "<stdin>", line 1, in <module>
</span><span> File "<stdin>", line 2, in loop_squares
</span><span>[0, 1, 4, 9]
</span><span>>>> comp_squares()
</span><span> File "<stdin>", line 1, in <module>
</span><span> File "<stdin>", line 2, in comp_squares
</span><span> File "<stdin>", line 2, in <listcomp>
</span><span>[0, 1, 4, 9]
</span></code></pre></div>
<p class="note">The stack shows the file as <code>"<stdin>"</code> because I defined the functions within a REPL session. If
they were in an actual file, we’d obviously get the file name there.</p>
<p>As we suspected, the comprehension function adds another layer to the stack frame, the <code><listcomp></code>,
which is doing the work of the comprehension.</p>
<h2 id="live-code-converter">Live Code Converter<a class="headerlink" href="#live-code-converter" title="Permanent link">¶</a></h2>
<p>Here’s a little tool that converts your code written in the form of a list/set/dict comprehension,
into one that is written using traditional <code>for</code>-loops.</p>
<div id=converterBox>
<textarea id=compCodeEl onKeydown="setTimeout(updateLoopCode)">[n ** 2 for n in range(9) if n % 2 == 0]</textarea>
<textarea id=loopCodeEl readonly></textarea>
<style>
#converterBox {
display: flex;
flex-wrap: wrap;
}
#converterBox textarea {
flex-grow: 1;
margin: 6px;
height: 148px;
font-size: inherit;
font-variant-ligatures: none;
}
</style>
<script defer>
updateLoopCode();
function updateLoopCode() {
document.getElementById("loopCodeEl").value = computeLoopCode(document.getElementById("compCodeEl").value);
}
function computeLoopCode(code) {
code = code.trim();
const closers = {'"': '"', "'": "'", "(": ")", "[": "]", "{": "}"};
if (code[0] !== "[" && code[0] !== "{")
return "";
if (code[code.length - 1] !== closers[code[0]])
return "";
let type = code[0] == "[" ? "list" : "set";
let i = 1;
let expr = '';
const stack = [], parts = [];
for (; i < code.length - 1; ++i) {
const ch = code[i];
if (stack.length > 0 && ch === stack[stack.length - 1]) {
expr += stack.pop();
} else if (ch.match(/["'(\[{]/)) {
expr += ch;
stack.push(closers[ch]);
} else if (stack.length > 0) {
expr += ch;
} else if (stack.length === 0 && ch === ":") {
type = "dict";
parts.push(expr);
expr = '';
} else {
const match = code.substr(i).match(/^(for|if)\b/);
if (match) {
parts.push(expr);
expr = ch;
} else {
expr += ch;
}
}
}
if (expr.length)
parts.push(expr);
for (const i in parts)
parts[i] = parts[i].trim();
const loopCodeLines = [];
switch (type) {
case "list":
loopCodeLines.push("result = []")
break;
case "set":
loopCodeLines.push("result = set()")
break;
case "dict":
loopCodeLines.push("result = {}")
break;
}
const resultPart = parts.shift(), resultValuePart = type === "dict" ? parts.shift() : null;
let indentLevel = 0;
for (const part of parts) {
loopCodeLines.push(makeIndent(indentLevel) + part + ":");
++indentLevel;
}
switch (type) {
case "list":
loopCodeLines.push(makeIndent(indentLevel) + "result.append(" + resultPart + ")");
break;
case "set":
loopCodeLines.push(makeIndent(indentLevel) + "result.add(" + resultPart + ")");
break;
case "dict":
loopCodeLines.push(makeIndent(indentLevel) + "result[" + resultPart + "] = " + resultValuePart);
break;
}
return loopCodeLines.join('\n');
}
function makeIndent(level) {
level *= 4;
const spaces = [];
while (level--)
spaces.push(' ');
return spaces.join('');
}
</script>
</div>
<p>It’s powered by an extremely light parser (doesn’t even qualify to be called that), but it can help
illustrate the point. It can also be helpful for visualizing nested loops and comprehensions with
multiple <code>for</code> statements.</p>
<p>Here’s some examples to try this with:</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>Comprehension Code (click to put in converter)</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>[n ** 2 for n in range(4)]</code></td>
</tr>
<tr>
<td><code>[n ** 2 for n in range(4) if n % 2 == 0]</code></td>
</tr>
<tr>
<td><code>{n ** 2 for n in range(4) if n % 2 == 0}</code></td>
</tr>
<tr>
<td><code>[r"abc def" for n in range(4)]</code></td>
</tr>
<tr>
<td><code>[(1, 2) for n in range(4)]</code></td>
</tr>
<tr>
<td><code>[n * m for n in range(4) for m in range(3) if n % 2 == 0]</code></td>
</tr>
<tr>
<td><code>{n * m for n in range(4) for m in range(3) if n % 2 == 0}</code></td>
</tr>
<tr>
<td><code>{n: n ** 2 for n in range(4) if n % 2 == 0}</code></td>
</tr>
</tbody>
</table>
</div>
<style>
#examplesTable a { text-decoration: none }
</style>
<script defer>
{
const table = document.evaluate(
"//th[starts-with(text(),'Comprehension Code')]", document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null)
.singleNodeValue.closest('table');
table.id = 'examplesTable';
table.addEventListener('click', (event) => {
if (event.target.tagName === 'A') {
document.getElementById('compCodeEl').value = event.target.innerText;
updateLoopCode();
}
});
for (const codeEl of table.getElementsByTagName('code')) {
codeEl.insertAdjacentHTML('afterBegin', '<a href="#"></a>');
codeEl.firstElementChild.append(codeEl.firstElementChild.nextSibling);
}
}
</script>
<!-- TODO: Asynchronous comprehensions https://www.python.org/dev/peps/pep-0530/ -->
<h2 id="conclusion">Conclusion<a class="headerlink" href="#conclusion" title="Permanent link">¶</a></h2>
<p>Comprehensions are a powerful feature in Python that can create very readable code when used
correctly. However, like everything else, they have a place and time and it’s not everywhere and
all-the-time. It’s important to understand them well if you’re doing more than the trivial list
comprehension.</p>
<p>Do check out the official documentation on <a href="https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions" rel="noopener noreferrer" target="_blank">List Comprehensions</a>, which contains
a lot of <em>good</em> examples and ideas I didn’t discuss here.</p>
<p>Additionally, at the expense of repeating the same thing, there’s some experiments on this page that
are only intended for learning. Please do <strong>not</strong> use them in production code. Have pity on your
future self.</p>Automating the Vim workplace — Chapter Ⅱ2020-02-16T00:00:00+05:302020-02-16T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2020-02-16:/posts/automating-the-vim-workplace-2/<p>This is a follow-up of the <a href="../automating-the-vim-workplace/">Automate the Vim workplace</a> article I published last
month. As promised, here’s a follow up with more on how I identified and addressed things in Vim
that could be improved to speed me up. Feel free to grab the ideas in this article …</p><p>This is a follow-up of the <a href="../automating-the-vim-workplace/">Automate the Vim workplace</a> article I published last
month. As promised, here’s a follow up with more on how I identified and addressed things in Vim
that could be improved to speed me up. Feel free to grab the ideas in this article or, better yet,
take inspiration and inspect your workflow to identify such opportunities.</p>
<p>This article is part of a series:</p>
<ol>
<li><a href="../automating-the-vim-workplace/">Chapter Ⅰ</a>.</li>
<li>Chapter Ⅱ (this article).</li>
<li><a href="../automating-the-vim-workplace-3/">Chapter Ⅲ</a>.</li>
</ol>
<div class="toc"><span class="toctitle">Table of Contents</span><ul>
<li><a href="#easier-alternative-to">Easier Alternative to :</a></li>
<li><a href="#repeat-key-mappings">Repeat Key Mappings</a></li>
<li><a href="#ruler-vs-status-line">Ruler vs Status Line</a></li>
<li><a href="#opening-switching-buffers">Opening & Switching Buffers</a></li>
<li><a href="#change-cwd-smartly">Change CWD Smartly</a></li>
<li><a href="#jumping-over-paragraphs">Jumping over Paragraphs</a></li>
<li><a href="#vertical-line-selection">Vertical Line Selection</a></li>
<li><a href="#zoom-when-presenting">Zoom When Presenting</a></li>
<li><a href="#copy-lines-as-csv">Copy Lines as CSV</a></li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
</div>
<p class="note">Please note that all that I share below is what I’m using with Vim (more specifically, GVim on
Windows). I don’t use Neovim (yet) and I can’t speak for any of the below for Neovim.</p>
<h2 id="easier-alternative-to">Easier Alternative to <code>:</code><a class="headerlink" href="#easier-alternative-to" title="Permanent link">¶</a></h2>
<p>Going to the command-line mode for entering Ex commands is used very often, yet requires the hitting
of <kbd>Shift</kbd> and <kbd>;</kbd> keys. This, while there’s a giant blank key right under my
thumbs that has no unique & practical purpose in the normal mode, the <kbd>Space</kbd> key.</p>
<div class="hl"><pre class=content><code><span><span class="nb">noremap</span> <span class="p"><</span>Space<span class="p">></span> :
</span></code></pre></div>
<p>This is likely my oldest mapping that survives even today. It’s also the one I miss the most when
working with Vim on servers.</p>
<p>Another popular alternative for this mapping is the <kbd>;</kbd> key. However, unlike the
<kbd>Space</kbd> key, this one has a useful default functionality, which will be lost. (Look up <code>:h
;</code> to find out, I won’t repeat it here).</p>
<p class="note">Note that we use <code>noremap</code> here, not <code>nnoremap</code>. So this works when in visual mode as well.</p>
<h2 id="repeat-key-mappings">Repeat Key Mappings<a class="headerlink" href="#repeat-key-mappings" title="Permanent link">¶</a></h2>
<p>There’s some mappings like <kbd>dd</kbd>, <kbd>cc</kbd> etc. that are made of two keys repeated
sequentially. While the appear convenient, hitting them usually takes slightly longer than hitting
two different keys in quick succession.</p>
<p>So, for all these type of bindings (and then some), I have a predictable alternative that:</p>
<div class="hl"><pre class=content><code><span><span class="c">" Maps that repeat a key can instead use the `.` key.</span>
</span><span><span class="nb">nnoremap</span> <span class="k">d</span>. dd
</span><span><span class="nb">nnoremap</span> <span class="k">y</span>. yy
</span><span><span class="nb">nnoremap</span> <span class="k">c</span>. <span class="k">cc</span>
</span><span><span class="nb">nnoremap</span> <span class="k">g</span>. gg
</span><span><span class="nb">nnoremap</span> <span class="k">v</span>. V
</span></code></pre></div>
<p>These bindings are a lot more convenient once our fingers get used to them and we get used to the
mnemonic of the <kbd>.</kbd> here.</p>
<h2 id="ruler-vs-status-line">Ruler vs Status Line<a class="headerlink" href="#ruler-vs-status-line" title="Permanent link">¶</a></h2>
<p>This is another topic that gets a lot of attention when one is setting up their Vim working
environment. What with all the fancy status-line plugins in the wild, it is easy to get carried
away.</p>
<p>My recommendation (nothing unique, has been said by better people before), is that you look at your
working style first. How often do you make it a point to look at the status line while working? Now
compare this to the fact that the status line costs you one line of vertical space. Measure for
yourself if it’s worth it.</p>
<p>If your question is, but what’s the alternative? Where do I see stuff like the current line number,
column number, file type, the git branch, wi-fi status of the coffee shop across the street etc.
etc.? My answer is the same again, <em>firstly</em>, see what you need, identify what you’ll miss and
narrow down to a minimal list of the stuff you need. Whatever you don’t <strong>need</strong> is most likely just
a <strong>want</strong> and will end up being a distraction when you’re in deep thought (the worst kind of
distraction). <em>Secondly</em>, we have the following other options.</p>
<p><strong>One alternative</strong> is to use the <a href="http://vimdoc.sourceforge.net/htmldoc/options.html#'ruler'" rel="noopener noreferrer" target="_blank"><code>ruler</code></a> option. This is similar to the status line,
although not quite as flexible. But don’t let that discourage you, for minimal information to be
shown in the corner of your Vim, it’s plenty powerful. By default, it just shows the current cursor
position, but can be configured to show anything with the <a href="http://vimdoc.sourceforge.net/htmldoc/options.html#'rulerformat'" rel="noopener noreferrer" target="_blank"><code>rulerformat</code></a> option. I
won’t go into detail on how to configure them (may be in the future / others have done it better
than I could).</p>
<p>First, turn on <code>ruler</code>.</p>
<div class="hl"><pre class=content><code><span><span class="k">set</span> <span class="nb">ruler</span>
</span></code></pre></div>
<p>Next, I set <code>rulerformat</code> as a variable since it’s slightly easier this way when dealing with escape
characters.</p>
<div class="hl"><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span></pre><pre class=content><code><span><span class="k">let</span> &<span class="nb">rulerformat</span> <span class="p">=</span> <span class="s1">'%50(b%n %{&ff} %{&ft}'</span> .
</span><span> \ <span class="s1">'%( %{len(getqflist()) ? ("q" . len(getqflist())) : ""}%)'</span> .
</span><span> \ <span class="s1">'%( %{search("\\s$", "cnw", 0, 200) ? "∙$" : ""}%)'</span> .
</span><span> \ <span class="s1">'%( %{exists("b:stl_fn") ? call(b:stl_fn) : ""}%)'</span> .
</span><span> \ <span class="s1">'%= L%l,%c%V %P %*%)'</span>
</span></code></pre></div>
<p>Each line in the above snippet is a little piece of information that I need to know at a glance.
Here’s a run down:</p>
<ol>
<li>Buffer number, <a href="http://vimdoc.sourceforge.net/htmldoc/options.html#'fileformat'" rel="noopener noreferrer" target="_blank"><code>'fileformat'</code></a> (indicates line endings), <a href="http://vimdoc.sourceforge.net/htmldoc/options.html#'filetype'" rel="noopener noreferrer" target="_blank"><code>'filetype'</code></a>.</li>
<li>A count of items in the <a href="http://vimdoc.sourceforge.net/htmldoc/quickfix.html#quickfix" rel="noopener noreferrer" target="_blank">quickfix</a> list.</li>
<li>An indicator for trailing whitespace in the current buffer.</li>
<li>A buffer specific function that may be called for additional input to be shown. I hardly use this
currently.</li>
<li>Cursor position information.</li>
</ol>
<p><strong>The second alternative</strong> is the <a href="http://vimdoc.sourceforge.net/htmldoc/options.html#'titlestring'" rel="noopener noreferrer" target="_blank"><code>titlestring</code></a>. This defines what shows up in the
title bar of the window-manager’s window (not Vim <a href="http://vimdoc.sourceforge.net/htmldoc/windows.html#windows" rel="noopener noreferrer" target="_blank">window</a>). </p>
<p>Using this is quite similar to using the ruler. Just turn it on and set a value to be shown. This is
what I use currently:</p>
<div class="hl"><pre class=content><code><span><span class="k">set</span> <span class="nb">title</span>
</span><span><span class="k">let</span> &<span class="nb">titlestring</span> <span class="p">=</span> <span class="s1">'%t%( %m%r%)%( <%{get(g:, "cur_project", "")}>%)'</span> .
</span><span> \ <span class="s1">'%( (%{expand("%:~:.:h")})%)'</span> .
</span><span> \ <span class="s1">'%( (%{getcwd()})%)%( %a%) - %(%{v:servername}%)'</span>
</span></code></pre></div>
<p>This contains the buffer’s name, indicators for modified and read-only, value of the global variable
<code>cur_project</code> (if set), path of the current buffer relative from current directory, the current
working directory itself, and finally, the <a href="http://vimdoc.sourceforge.net/htmldoc/eval.html#v:servername" rel="noopener noreferrer" target="_blank"><code>servername</code></a>.</p>
<p class="note">Note that I use <code>titlestring</code> with GVim. If you want it to work when working with terminal Vim as
well, you might need to consult your terminal emulator’s (or multiplexer’s) documentation regarding
this.</p>
<h2 id="opening-switching-buffers">Opening & Switching Buffers<a class="headerlink" href="#opening-switching-buffers" title="Permanent link">¶</a></h2>
<p>This is a problem that is usually solved with one of the fuzzy finder plugins. The current most
popular one appears to be a plugin based on fzf. I have used <a href="https://github.com/wincent/command-t" rel="noopener noreferrer" target="_blank">Command-T</a>, <a href="https://github.com/ctrlpvim/ctrlp.vim" rel="noopener noreferrer" target="_blank">ctrlp</a>, <a href="https://github.com/Yggdroot/LeaderF" rel="noopener noreferrer" target="_blank">LeaderF</a>
and even one that I made for myself. But then something happened on my system that broke the
fuzzy-finder that I was using at the time (don’t exactly remember which). Pressed for time, I chose
to use the commands that come with Vim, and haven’t bothered to investigate what broke the fuzzy
finder. The following has been enough to keep me happy and productive:</p>
<div class="hl"><pre class=content><code><span><span class="c">" Simple mappings for buffer switching.</span>
</span><span><span class="nb">nnoremap</span> <span class="p"><</span>Leader<span class="p">></span><span class="k">d</span> :<span class="k">b</span> *
</span><span><span class="nb">nnoremap</span> <span class="p"><</span>Leader<span class="p">></span><span class="k">l</span> :<span class="k">ls</span><span class="p"><</span>CR<span class="p">></span>
</span><span>
</span><span><span class="c">" Find/edit files</span>
</span><span><span class="nb">nnoremap</span> <span class="p"><</span>Leader<span class="p">></span><span class="k">f</span> :find *
</span><span><span class="nb">nnoremap</span> <span class="p"><</span>Leader<span class="p">></span><span class="k">e</span> :edit **/*
</span></code></pre></div>
<p>It may not seem as powerful when you put it beside the shiny screen recordings of the fuzzy finder
plugins, but it just works ™ and works perfectly fine. I took inspiration from <a href="https://vimways.org/2018/death-by-a-thousand-files/" rel="noopener noreferrer" target="_blank">this excellent
article</a> on the topic by <a href="https://github.com/romainl" rel="noopener noreferrer" target="_blank">romainl</a>. Thank you!</p>
<h2 id="change-cwd-smartly">Change CWD Smartly<a class="headerlink" href="#change-cwd-smartly" title="Permanent link">¶</a></h2>
<p>This is another very old mapping that still survives. It comes in two flavors, I use <code>cm</code> and <code>cu</code>
for these. Briefly,</p>
<ul>
<li><code>cm</code> – <em>cd</em> to current buffer’s directory.</li>
<li><code>cu</code> – <em>cd</em> to the current <strong>project</strong>’s root directory.</li>
</ul>
<p>The first one is fairly simple to implement:</p>
<div class="hl"><pre class=content><code><span><span class="c">" Mapping to change pwd to the directory of the current buffer.</span>
</span><span><span class="nb">nnoremap</span> cm :<span class="k">call</span> <span class="k">chdir</span><span class="p">(</span>expand<span class="p">(</span><span class="s1">'%:p:h'</span><span class="p">))</span> \<span class="p">|</span> <span class="k">pwd</span><span class="p"><</span>CR<span class="p">></span>
</span></code></pre></div>
<p>For the second one, it is important to understand how a project’s root identified. To me, it’s a
directory containing the <code>.git</code> folder. That’s not a perfect answer, but it hasn’t failed me a lot
so far. Nevertheless, my mapping below supports looking for a few other such <em>project markers</em>, like
<code>.hg</code> for mercurial VCS, <code>.project</code> for Eclipse projects, <code>manage.py</code> for Django projects etc.</p>
<p>There’s a few plugins that do this as well, probably better than this, but I like to do these kind
of simple things myself, to have control and to have it tuned to my habits.</p>
<div class="hl"><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span><span>9
</span><span>10
</span><span>11
</span><span>12
</span><span>13
</span><span>14
</span><span>15
</span></pre><pre class=content><code><span><span class="c">" Map to change pwd to the repo-root-directory of the current buffer.</span>
</span><span><span class="nb">nnoremap</span> cu :<span class="k">call</span> <span class="p"><</span>SID<span class="p">></span>CdToRepoRoot<span class="p">()<</span>CR<span class="p">></span>
</span><span><span class="k">let</span> <span class="k">g</span>:markers <span class="p">=</span> split<span class="p">(</span><span class="s1">'.git .hg .svn .project .idea manage.py pom.xml'</span><span class="p">)</span>
</span><span><span class="k">fun</span> s:CdToRepoRoot<span class="p">()</span> abort
</span><span> <span class="k">for</span> marker <span class="k">in</span> <span class="k">g</span>:markers
</span><span> <span class="k">let</span> root <span class="p">=</span> finddir<span class="p">(</span>marker<span class="p">,</span> expand<span class="p">(</span><span class="s1">'%:p:h'</span><span class="p">)</span> . <span class="s1">';'</span><span class="p">)</span>
</span><span> <span class="k">if</span> <span class="p">!</span>empty<span class="p">(</span>root<span class="p">)</span>
</span><span> <span class="k">let</span> root <span class="p">=</span> fnamemodify<span class="p">(</span>root<span class="p">,</span> <span class="s1">':h'</span><span class="p">)</span>
</span><span> <span class="k">call</span> <span class="k">chdir</span><span class="p">(</span>root<span class="p">)</span>
</span><span> echo <span class="s1">'cd '</span> . root . <span class="s1">' (found '</span> . marker . <span class="s1">')'</span>
</span><span> <span class="k">return</span>
</span><span> <span class="k">endif</span>
</span><span> <span class="k">endfor</span>
</span><span> <span class="k">echoerr</span> <span class="s1">'No repo root found.'</span>
</span><span><span class="k">endfun</span>
</span></code></pre></div>
<p>What’s happening here is that for each marker in <code>g:markers</code>, we navigate up from the current
buffer’s directory until we find a folder that has the marker. If found, we <code>chdir</code> to it.
Otherwise, we repeat the process for the next marker. If no marker was found, we just show an error
message. Simple & effective.</p>
<h2 id="jumping-over-paragraphs">Jumping over Paragraphs<a class="headerlink" href="#jumping-over-paragraphs" title="Permanent link">¶</a></h2>
<p>This is one of the things I wanted for a long time, but couldn’t figure out a robust solution. It’s
only last year (IIRC) that I finally nailed it and this version works exactly how I want it.</p>
<p>The idea is that the keys <kbd><C-j></kbd> and <kbd><C-k></kbd> will jump over
paragraphs, and place the cursor at the start of the first line in the paragraph. I needed the
following to be true:</p>
<ol>
<li>After hitting either key, the cursor is positioned on the first line of a paragraph, <strong>never</strong> on
a blank line.</li>
<li>When in the middle of a paragraph, <kbd><C-k></kbd> moves the cursor to the first line of
the <strong>current</strong> paragraph.</li>
<li>Moves are <strong>not</strong> added to the <a href="http://vimdoc.sourceforge.net/htmldoc/motion.html#jumplist" rel="noopener noreferrer" target="_blank">jumplist</a>.</li>
<li>Cursor is placed on the first non-blank character of the paragraph. Like <a href="http://vimdoc.sourceforge.net/htmldoc/motion.html#^" rel="noopener noreferrer" target="_blank"><kbd>^</kbd></a>,
not <a href="http://vimdoc.sourceforge.net/htmldoc/motion.html#0" rel="noopener noreferrer" target="_blank"><kbd>0</kbd></a>.</li>
<li>They should work just fine in both normal & visual modes and the visual mode type should <strong>not</strong>
change when hitting the keys.</li>
</ol>
<p>Here’s how I’m doing this:</p>
<div class="hl"><pre class=content><code><span><span class="nb">noremap</span> <span class="p"><</span><span class="k">silent</span><span class="p">></span> <span class="p"><</span>expr<span class="p">></span> <span class="p"><</span>C<span class="p">-</span><span class="k">k</span><span class="p">></span> <span class="p">(</span>line<span class="p">(</span><span class="s1">'.'</span><span class="p">)</span> <span class="p">-</span> search<span class="p">(</span><span class="s1">'^\n.\+$'</span><span class="p">,</span> <span class="s1">'Wenb'</span><span class="p">))</span> . <span class="s1">'kzv^'</span>
</span><span><span class="nb">noremap</span> <span class="p"><</span><span class="k">silent</span><span class="p">></span> <span class="p"><</span>expr<span class="p">></span> <span class="p"><</span>C<span class="p">-</span><span class="k">j</span><span class="p">></span> <span class="p">(</span>search<span class="p">(</span><span class="s1">'^\n.'</span><span class="p">,</span> <span class="s1">'Wen'</span><span class="p">)</span> <span class="p">-</span> line<span class="p">(</span><span class="s1">'.'</span><span class="p">))</span> . <span class="s1">'jzv^'</span>
</span></code></pre></div>
<p>I needed to use the <a href="http://vimdoc.sourceforge.net/htmldoc/map.html#:map-expression" rel="noopener noreferrer" target="_blank"><code><expr></code></a> way of mapping keys here so as to satisfy the third and
fifth of my requirements list above.</p>
<p>The default mappings that come closest to this are the <a href="http://vimdoc.sourceforge.net/htmldoc/motion.html#{" rel="noopener noreferrer" target="_blank"><kbd>{</kbd></a> and
<a href="http://vimdoc.sourceforge.net/htmldoc/motion.html#}" rel="noopener noreferrer" target="_blank"><kbd>}</kbd></a>. But they don’t satisfy my first and third requirements, and I’m <em>very</em> picky.
I actually still use them, when they seem appropriate, but I hit the above custom mappings a lot
more often.</p>
<h2 id="vertical-line-selection">Vertical Line Selection<a class="headerlink" href="#vertical-line-selection" title="Permanent link">¶</a></h2>
<p>This is one of my recent favorites (< 2 years old). This is the use case, usually when I went
into visual block mode with <a href="http://vimdoc.sourceforge.net/htmldoc/visual.html#CTRL-V" rel="noopener noreferrer" target="_blank"><kbd><C-v></kbd></a>, I extend it upwards to the first line in
paragraph and also downwards to the last line of the paragraph.</p>
<p>The following GIF might make this easier to understand:</p>
<p class="img"><a href="https://sharats.me/static/vim-vertical-selection-manual.gif"><img alt="Vertical-line selection demo" src="https://sharats.me/static/vim-vertical-selection-manual.gif"></a></p>
<p>This seems simple enough to do manually when there’s just a few lines to deal with. But when there’s
>15 lines and you notice yourself doing this a dozen times a day, you need a better way.</p>
<p>The following mapping is my solution to this. When I hit <kbd>vm</kbd>, the following happens:</p>
<ol>
<li>Visual block selection is activated.</li>
<li>Selection extends as a single column downwards until we hit a line that’s shorter than the cursor
column position or we hit end of buffer.</li>
<li>Selection extends in a similar fashion upwards.</li>
</ol>
<p>The way this is implemented is that firstly we compute the number of lines to be travelled upwards
and downwards from the current position. Then we construct a normal mode command which will start
the visual-block mode and move the cursor so that the vertical line is selected. For example, in the
GIF above, our function would construct the normal mode command <kbd>\<C-v>2jo1k</kbd>. This
works quite well and doesn’t affect the jumplist.</p>
<div class="hl"><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span><span>9
</span><span>10
</span><span>11
</span><span>12
</span><span>13
</span><span>14
</span><span>15
</span><span>16
</span><span>17
</span><span>18
</span><span>19
</span></pre><pre class=content><code><span><span class="nb">nnoremap</span> <span class="p"><</span>expr<span class="p">></span> vm <span class="p"><</span>SID<span class="p">></span>VisualVLine<span class="p">()</span>
</span><span><span class="k">fun</span><span class="p">!</span> s:VisualVLine<span class="p">()</span> abort
</span><span> <span class="k">let</span> [_<span class="p">,</span> lnum<span class="p">,</span> <span class="k">col</span>; _] <span class="p">=</span> getcurpos<span class="p">()</span>
</span><span> <span class="k">let</span> line <span class="p">=</span> getline<span class="p">(</span><span class="s1">'.'</span><span class="p">)</span>
</span><span> <span class="k">let</span> <span class="k">col</span> <span class="p">+=</span> strdisplaywidth<span class="p">(</span>line<span class="p">)</span> <span class="p">-</span> strwidth<span class="p">(</span>line<span class="p">)</span>
</span><span>
</span><span> <span class="k">let</span> [from<span class="p">,</span> <span class="k">to</span>] <span class="p">=</span> [lnum<span class="p">,</span> lnum]
</span><span> <span class="k">while</span> strdisplaywidth<span class="p">(</span>getline<span class="p">(</span>from <span class="p">-</span> <span class="m">1</span><span class="p">))</span> <span class="p">>=</span> <span class="k">col</span>
</span><span> <span class="k">let</span> from <span class="p">-=</span> <span class="m">1</span>
</span><span> <span class="k">endwhile</span>
</span><span>
</span><span> <span class="k">while</span> strdisplaywidth<span class="p">(</span>getline<span class="p">(</span><span class="k">to</span> <span class="p">+</span> <span class="m">1</span><span class="p">))</span> <span class="p">>=</span> <span class="k">col</span>
</span><span> <span class="k">let</span> <span class="k">to</span> <span class="p">+=</span> <span class="m">1</span>
</span><span> <span class="k">endwhile</span>
</span><span>
</span><span> <span class="k">return</span> <span class="s2">"\<C-v>"</span> .
</span><span> \ <span class="p">(</span><span class="k">to</span> <span class="p">==</span> lnum ? <span class="s1">''</span> : <span class="p">(</span><span class="k">to</span> <span class="p">-</span> lnum . <span class="s1">'jo'</span><span class="p">))</span> .
</span><span> \ <span class="p">(</span>from <span class="p">==</span> lnum ? <span class="s1">''</span> : <span class="p">(</span>lnum <span class="p">-</span> from . <span class="s1">'k'</span><span class="p">))</span>
</span><span><span class="k">endfun</span>
</span></code></pre></div>
<h2 id="zoom-when-presenting">Zoom When Presenting<a class="headerlink" href="#zoom-when-presenting" title="Permanent link">¶</a></h2>
<p>Occasionally (read: more often than I like to admit), I end up having to present some code to a
small audience with is slightly larger than my immediate team. Additionally, I also note down the
proceedings of meetings in Vim and present them on screen sharing to get inputs and corrections,
essentially steering the meeting.</p>
<p>On such occasions, I need to increase the font size so it’s visible to everyone in the audience /
meeting. When presenting, I’ve heard complaints from people sitting a bit far back, and when sharing
my screen, I’ve heard complaints from people connecting from their mobile devices (!).</p>
<p>The following two mappings are born out of this need.</p>
<div class="hl"><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span><span>9
</span><span>10
</span><span>11
</span><span>12
</span></pre><pre class=content><code><span><span class="c">" Increase / Decrease font size.</span>
</span><span><span class="k">let</span> <span class="k">g</span>:font_size_pat <span class="p">=</span> s:iswin ? <span class="s1">':h\zs\d\+'</span> : <span class="s1">'\d\+'</span>
</span><span><span class="nb">nnoremap</span> <span class="p"><</span><span class="k">silent</span><span class="p">></span> z<span class="p">+</span> :<span class="p"><</span>C<span class="p">-</span><span class="k">u</span><span class="p">></span><span class="k">let</span> &<span class="nb">guifont</span> <span class="p">=</span> substitute<span class="p">(</span>
</span><span> \ &<span class="nb">guifont</span><span class="p">,</span> <span class="k">g</span>:font_size_pat<span class="p">,</span>
</span><span> \ <span class="s1">'\=eval(submatch(0) + '</span> . <span class="k">v</span>:count1 . <span class="s1">')'</span><span class="p">,</span> <span class="s1">''</span><span class="p">)</span>
</span><span> \ \<span class="p">|</span><span class="k">simalt</span> <span class="p">~</span><span class="k">x</span><span class="p"><</span>CR<span class="p">></span>
</span><span><span class="nb">nnoremap</span> <span class="p"><</span><span class="k">silent</span><span class="p">></span> z<span class="p">-</span> :<span class="p"><</span>C<span class="p">-</span><span class="k">u</span><span class="p">></span><span class="k">let</span> &<span class="nb">guifont</span> <span class="p">=</span> substitute<span class="p">(</span>
</span><span> \ &<span class="nb">guifont</span><span class="p">,</span> <span class="k">g</span>:font_size_pat<span class="p">,</span>
</span><span> \ <span class="s1">'\=eval(submatch(0) - '</span> . <span class="k">v</span>:count1 . <span class="s1">')'</span><span class="p">,</span> <span class="s1">''</span><span class="p">)</span>
</span><span> \ \<span class="p">|</span><span class="k">simalt</span> <span class="p">~</span><span class="k">x</span><span class="p"><</span>CR<span class="p">></span>
</span><span>nmap z<span class="p"><</span>kPlus<span class="p">></span> z<span class="p">+</span>
</span><span>nmap z<span class="p"><</span>kMinus<span class="p">></span> z<span class="p">-</span>
</span></code></pre></div>
<p>This snippet defines two mappings in normal mode, <kbd>z+</kbd> and <kbd>z-</kbd>, that work with
the keypad as well (which is what the last two lines are for).</p>
<p>This works by calling substitute on the <code>guifont</code> option with a pattern tailored for how the font
size is specified on the current platform. The replacement for this pattern contains a
<a href="http://vimdoc.sourceforge.net/htmldoc/change.html#sub-replace-expression" rel="noopener noreferrer" target="_blank">sub-replace-expression</a> that spits out the new font size number.</p>
<p>However, there was a quirk. Once the font size is changed, the Vim window is restored (not maximized
anymore). This was annoying to me since I almost always keep my Vim maximized (especially when
presenting). So, the following <code>simalt ~x</code> will maximize the window again.</p>
<p>Another small additional feature in these mappings is that they accept a count. For example, hitting
<kbd>z+</kbd> will increase the font size by 1 point, hitting <kbd>3z+</kbd> will increase it by 3
points.</p>
<h2 id="copy-lines-as-csv">Copy Lines as CSV<a class="headerlink" href="#copy-lines-as-csv" title="Permanent link">¶</a></h2>
<p>I write my notes, both work and study in Vim, as plain text, loosely Markdown (I’ll write about that
in a future article). Among these notes, there’s occasionally lists of domain specific stuff for the
applications or projects I’m working with. I usually need these as reference for objects that I
often look up in databases. For example, I have a note like the following:</p>
<div class="hl"><pre class=content><code><span>| Object | Database ID |
</span><span>| ------- | ----------- |
</span><span>| Mercury | 4 |
</span><span>| Venus | 32 |
</span><span>| Earth | 42 |
</span><span>| Moon | 44 |
</span></code></pre></div>
<p>From this I want do a visual-block selection of all the ID numbers and paste it into an SQL <code>SELECT</code>
query that looks something like:</p>
<div class="hl"><pre class=content><code><span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">celestial_objects</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="p">(</span><span class="mi">4</span><span class="p">,</span><span class="w"> </span><span class="mi">32</span><span class="p">,</span><span class="w"> </span><span class="mi">42</span><span class="p">,</span><span class="w"> </span><span class="mi">44</span><span class="p">);</span>
</span></code></pre></div>
<p>Essentially, what I needed was to copy the visually selected lines as a comma separated string.
It might seem an overkill solution for the example I’m demonstrating here, but when there’s ID
numbers in the millions and Markdown tables with over a dozen rows as reference in my notes, it
quickly adds up to being extremely annoying.</p>
<p>So, I came up with the following:</p>
<div class="hl"><pre class=content><code><span><span class="c">" Copy selected lines as CSV</span>
</span><span>xnoremap <span class="p"><</span><span class="k">silent</span><span class="p">></span> <span class="p"><</span>Leader<span class="p">></span><span class="k">y</span> :<span class="p"><</span>C<span class="p">-</span><span class="k">u</span><span class="p">></span><span class="k">call</span> <span class="p"><</span>SID<span class="p">></span>CopyLinesAsCSV<span class="p">()<</span>CR<span class="p">></span>
</span><span><span class="k">fun</span> s:CopyLinesAsCSV<span class="p">()</span> abort
</span><span> <span class="k">let</span> [_<span class="p">,</span> l1<span class="p">,</span> c1<span class="p">,</span> _] <span class="p">=</span> getpos<span class="p">(</span><span class="s2">"'<"</span><span class="p">)</span>
</span><span> <span class="k">let</span> [_<span class="p">,</span> l2<span class="p">,</span> c2<span class="p">,</span> _] <span class="p">=</span> getpos<span class="p">(</span><span class="s2">"'>"</span><span class="p">)</span>
</span><span> <span class="k">let</span> <span class="nb">lines</span> <span class="p">=</span> map<span class="p">(</span>getline<span class="p">(</span>l1<span class="p">,</span> l2<span class="p">),</span> {<span class="k">i</span><span class="p">,</span> <span class="k">l</span> <span class="p">-></span> trim<span class="p">(</span><span class="k">l</span>[c1<span class="m">-1</span>:c2<span class="m">-1</span>]<span class="p">)</span>}<span class="p">)</span>
</span><span> <span class="k">call</span> setreg<span class="p">(</span><span class="k">v</span>:<span class="k">register</span><span class="p">,</span> <span class="k">join</span><span class="p">(</span><span class="nb">lines</span><span class="p">,</span> <span class="s1">', '</span><span class="p">),</span> <span class="s1">'l'</span><span class="p">)</span>
</span><span><span class="k">endfun</span>
</span></code></pre></div>
<p>This defines a mapping in visual mode, <kbd><Leader>y</kbd> (which can take a register, just
like the default <kbd>y</kbd>) that takes the selected lines (or selected block), joins them with
<code>', '</code> and puts that in the register.</p>
<p>Here’s a preview of this in action:</p>
<p class="img"><a href="https://sharats.me/static/vim-copy-as-csv.gif"><img alt="Copy column as CSV demo" src="https://sharats.me/static/vim-copy-as-csv.gif"></a></p>
<p>This combined with the <kbd>vm</kbd> explained in a previous section, it’s really quick to take a
column of values as a comma separated string.</p>
<h2 id="conclusion">Conclusion<a class="headerlink" href="#conclusion" title="Permanent link">¶</a></h2>
<p>This is a continuous process of identifying and honing the habits at work. Considering how
programmable Vim can be when it comes to editing text, it’s both fun and productive to introspect.
Although I won’t discourage you from it, I recommend not to just blindly copy everything here into
your own vimrc. Take only if you need, take only what you need, and do take everything you need.</p>
<p>I plan to write the next chapter in this series next month, so stay tuned and remember to check
back.</p>
<p>Identify, optimize, repeat.</p>
<p class="note">Read the <a href="../automating-the-vim-workplace/">previous article</a>, or the <a href="../automating-the-vim-workplace-3/">next article</a> in this series.</p>Python's `itertools.groupby` callable2020-02-09T00:00:00+05:302020-02-09T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2020-02-09:/posts/python-itertools-groupby-callable/<p>The <code>groupby</code> utility from the <a href="https://docs.python.org/3/library/itertools.html" rel="noopener noreferrer" target="_blank"><code>itertools</code></a> module can be used to group contiguous items in a
sequence based on some property of the items.</p>
<p>Python has several utilities for working with lists and other sequence data types. In addition to a
lot of such utilities being directly available as builtins …</p><p>The <code>groupby</code> utility from the <a href="https://docs.python.org/3/library/itertools.html" rel="noopener noreferrer" target="_blank"><code>itertools</code></a> module can be used to group contiguous items in a
sequence based on some property of the items.</p>
<p>Python has several utilities for working with lists and other sequence data types. In addition to a
lot of such utilities being directly available as builtins (like <a href="../python-map-function/"><code>map</code></a>, <code>filter</code>,
<code>zip</code> etc), the <code>itertools</code> module is dedicated to this purpose. In this article, I’ll show the
<a href="https://docs.python.org/3/library/itertools.html#itertools.groupby" rel="noopener noreferrer" target="_blank"><code>groupby</code></a> callable from this standard library module. I hope to write more in the future
on the other awesome stuff from this module.</p>
<div class="toc"><span class="toctitle">Table of Contents</span><ul>
<li><a href="#basic-usage">Basic Usage</a></li>
<li><a href="#non-contiguous-groups">Non-contiguous Groups</a></li>
<li><a href="#groups-are-iterables">Groups are Iterables</a></li>
<li><a href="#a-really-bad-diy-implementation">A Really Bad DIY Implementation</a></li>
<li><a href="#usage-tips">Usage Tips</a></li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
</div>
<h2 id="basic-usage">Basic Usage<a class="headerlink" href="#basic-usage" title="Permanent link">¶</a></h2>
<p>The point of <code>itertools.groupby</code> can be illustrated quite easily by applying to a list of zeroes and
ones, to be grouped by their values. Check out the following example:</p>
<div class="hl"><pre class=content><code><span><span class="kn">import</span> <span class="nn">itertools</span>
</span><span>
</span><span><span class="n">numbers</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span>
</span><span>
</span><span><span class="k">for</span> <span class="n">grouping_value</span><span class="p">,</span> <span class="n">group_items</span> <span class="ow">in</span> <span class="n">itertools</span><span class="o">.</span><span class="n">groupby</span><span class="p">(</span><span class="n">numbers</span><span class="p">):</span>
</span><span> <span class="nb">print</span><span class="p">(</span><span class="s1">'By'</span><span class="p">,</span> <span class="n">grouping_value</span><span class="p">,</span> <span class="s1">'->'</span><span class="p">,</span> <span class="o">*</span><span class="n">group_items</span><span class="p">)</span>
</span></code></pre></div>
<p>This will produce the following output:</p>
<div class="hl"><pre class=content><code><span>By 1 -> 1 1 1
</span><span>By 0 -> 0 0
</span><span>By 1 -> 1
</span><span>By 0 -> 0 0 0
</span><span>By 1 -> 1
</span><span>By 0 -> 0
</span></code></pre></div>
<p>Now let’s look at this, little by little. The <code>groupby</code> call takes one or, probably more often, two
arguments:</p>
<dl>
<dt>iterable</dt>
<dd>An iterable (like a list or any other collection). Items in this collection will be grouped.</dd>
<dt>key (defaults to <code>None</code>)</dt>
<dd>A function that is applied to each element from <code>iterable</code>, the return values of which are used
to do the grouping.</dd>
<dt><em>returns</em></dt>
<dd>A generator that yields tuples of <code>(grouping_value, iterable_of_group_elements)</code> for each group
that was found.</dd>
</dl>
<p>In the example above, we give the <code>numbers</code> list to the <code>groupby</code> call which yields six groups (as
can be seen from the six lines of output). Since we haven’t provided a value for the <code>key</code> argument,
the grouping occurs on the elements themselves.</p>
<p>So now the output should make sense. The first group, where the <code>grouping_value</code> is <code>1</code> will contain
three elements, the first three <code>1</code>s in our list. The next group, where the <code>grouping_value</code> is <code>0</code>
will contain the next two <code>0</code>s in our list. This goes on until the list passed to <code>groupby</code> is
exhausted.</p>
<p class="note">It is important to note here that inside the tuples yielded by <code>groupby</code>, what we have are iterables
that yield the group’s items. They are not lists. More specifically, the tuple contains an object of
type <code>itertools._grouper</code>, which is just an iterable over the values in the group. This point is
elaborated in a <a href="#groups-are-iterables">section further below</a>.</p>
<h2 id="non-contiguous-groups">Non-contiguous Groups<a class="headerlink" href="#non-contiguous-groups" title="Permanent link">¶</a></h2>
<p>This often comes up as a surprise to people new to <code>itertools.groupby</code> (it certainly did for me).
The groups created are of contiguous regions only. For example, if we are trying group even and odd
numbers from a collection ordered of numbers, just a call to <code>groupby</code> can produce surprising
results:</p>
<div class="hl"><pre class=content><code><span><span class="kn">import</span> <span class="nn">itertools</span>
</span><span>
</span><span><span class="k">for</span> <span class="n">is_even</span><span class="p">,</span> <span class="n">number_group</span> <span class="ow">in</span> <span class="n">itertools</span><span class="o">.</span><span class="n">groupby</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">),</span> <span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span><span class="p">):</span>
</span><span> <span class="nb">print</span><span class="p">(</span><span class="s1">'Evens:'</span> <span class="k">if</span> <span class="n">is_even</span> <span class="k">else</span> <span class="s1">'Odds:'</span><span class="p">,</span> <span class="o">*</span><span class="n">number_group</span><span class="p">)</span>
</span></code></pre></div>
<p>This produces the following (probably unexpected) result:</p>
<div class="hl"><pre class=content><code><span>Evens: 0
</span><span>Odds: 1
</span><span>Evens: 2
</span><span>Odds: 3
</span><span>Evens: 4
</span><span>Odds: 5
</span><span>Evens: 6
</span><span>Odds: 7
</span><span>Evens: 8
</span><span>Odds: 9
</span></code></pre></div>
<p>What we would’ve liked is something like the following:</p>
<div class="hl"><pre class=content><code><span>Evens: 0 2 4 6 8
</span><span>Odds: 1 3 5 7 9
</span></code></pre></div>
<p>If we search the ever helpful internet for a solution to this “problem”, the answer seems to be to
sort the initial list with the same key function and then pass the result to <code>groupby</code>. This is how
that would work:</p>
<div class="hl"><pre class=content><code><span><span class="kn">import</span> <span class="nn">itertools</span>
</span><span>
</span><span><span class="k">def</span> <span class="nf">is_even</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
</span><span> <span class="k">return</span> <span class="n">n</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span>
</span><span>
</span><span>
</span><span><span class="k">for</span> <span class="n">is_even_val</span><span class="p">,</span> <span class="n">number_group</span> <span class="ow">in</span> <span class="n">itertools</span><span class="o">.</span><span class="n">groupby</span><span class="p">(</span><span class="nb">sorted</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">),</span> <span class="n">key</span><span class="o">=</span><span class="n">is_even</span><span class="p">),</span> <span class="n">key</span><span class="o">=</span><span class="n">is_even</span><span class="p">):</span>
</span><span> <span class="nb">print</span><span class="p">(</span><span class="s1">'Evens:'</span> <span class="k">if</span> <span class="n">is_even_val</span> <span class="k">else</span> <span class="s1">'Odds:'</span><span class="p">,</span> <span class="o">*</span><span class="n">number_group</span><span class="p">)</span>
</span></code></pre></div>
<p>This produces an output much closer to what we wanted:</p>
<div class="hl"><pre class=content><code><span>Odds: 1 3 5 7 9
</span><span>Evens: 0 2 4 6 8
</span></code></pre></div>
<p>Now, ignoring the evil of pre-mature optimization, the fact that we are calling the key function
twice might cause terminally serious itches to some developers. One (possibly silly) way around this
is to store the results of the key function right next to the values, as a tuple and then unpack the
values once we’re done grouping. This would look like:</p>
<div class="hl"><pre class=content><code><span>import itertools
</span><span>
</span><span>def is_even(n):
</span><span> return n % 2 == 0
</span><span>
</span><span>
</span><span>numbers = range(10)
</span><span>keyed_numbers = [(is_even(n), n) for n in numbers]
</span><span>sorted_numbers = sorted(keyed_numbers)
</span><span>
</span><span>for is_even_val, pair_group in itertools.groupby(sorted_numbers, key=lambda pair: pair[0]):
</span><span> print('Evens:' if is_even_val else 'Odds:', *(pair[1] for pair in pair_group))
</span></code></pre></div>
<p>This produces the same output as the previous example, but calls the key function (<code>is_even</code> in this
example’s case) only <em>once</em> per item in our list.</p>
<p class="note">Before you attempt the above apparent <em>solution</em> to performance issues, prove to yourself that
firstly, you <strong>have a performance issue</strong> and that this piece of code <strong>is at least part of the
reason</strong> for it. Otherwise you’re probably just wasting your time.</p>
<p>Since this is arguably more useful, let’s create an alternative <code>groupby</code> that will sort first and
then call <code>itertools.groupby</code>:</p>
<div class="hl"><pre class=content><code><span><span class="kn">import</span> <span class="nn">itertools</span>
</span><span>
</span><span><span class="k">def</span> <span class="nf">sorted_groupby</span><span class="p">(</span><span class="n">iterable</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
</span><span> <span class="k">yield from</span> <span class="n">itertools</span><span class="o">.</span><span class="n">groupby</span><span class="p">(</span><span class="nb">sorted</span><span class="p">(</span><span class="n">iterable</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="n">key</span><span class="p">),</span> <span class="n">key</span><span class="o">=</span><span class="n">key</span><span class="p">)</span>
</span></code></pre></div>
<p>We can use this function like:</p>
<div class="hl"><pre class=content><code><span><span class="k">for</span> <span class="n">is_even_val</span><span class="p">,</span> <span class="n">number_group</span> <span class="ow">in</span> <span class="n">sorted_groupby</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">),</span> <span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span><span class="p">):</span>
</span><span> <span class="nb">print</span><span class="p">(</span><span class="s1">'Evens:'</span> <span class="k">if</span> <span class="n">is_even_val</span> <span class="k">else</span> <span class="s1">'Odds:'</span><span class="p">,</span> <span class="o">*</span><span class="n">number_group</span><span class="p">)</span>
</span></code></pre></div>
<p>This will produce the same output as below:</p>
<div class="hl"><pre class=content><code><span>Odds: 1 3 5 7 9
</span><span>Evens: 0 2 4 6 8
</span></code></pre></div>
<h2 id="groups-are-iterables">Groups are Iterables<a class="headerlink" href="#groups-are-iterables" title="Permanent link">¶</a></h2>
<p>I have mentioned this earlier in this article, but it’s important enough to stress again. The group
collections yielded by the <code>groupby</code> call <strong>are not lists</strong>. They are iterables that are rendered
unusable upon yielding the next group. If you need the values, make sure you collect them before
going to the next group.</p>
<p>For example, consider the following snippet:</p>
<div class="hl"><pre class=content><code><span><span class="kn">import</span> <span class="nn">itertools</span>
</span><span><span class="kn">from</span> <span class="nn">pprint</span> <span class="kn">import</span> <span class="n">pprint</span>
</span><span>
</span><span><span class="n">names</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'Arthur'</span><span class="p">,</span> <span class="s1">'Trillian'</span><span class="p">,</span> <span class="s1">'ford'</span><span class="p">,</span> <span class="s1">'zaphod'</span><span class="p">,</span> <span class="s1">'slartibartfast'</span><span class="p">]</span>
</span><span>
</span><span><span class="n">by_casing</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span><span class="n">itertools</span><span class="o">.</span><span class="n">groupby</span><span class="p">(</span><span class="n">names</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="nb">str</span><span class="o">.</span><span class="n">istitle</span><span class="p">))</span>
</span><span><span class="n">pprint</span><span class="p">(</span><span class="n">by_casing</span><span class="p">)</span>
</span><span><span class="n">pprint</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">by_casing</span><span class="p">[</span><span class="kc">True</span><span class="p">]))</span>
</span><span><span class="n">pprint</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">by_casing</span><span class="p">[</span><span class="kc">False</span><span class="p">]))</span>
</span></code></pre></div>
<p>This produces the following output:</p>
<div class="hl"><pre class=content><code><span>{False: <itertools._grouper object at 0x0000000002B6D278>,
</span><span> True: <itertools._grouper object at 0x0000000002B6BF28>}
</span><span>[]
</span><span>[]
</span></code></pre></div>
<p>The seemingly strange thing to notice here, is that although <code>groupby</code> returned two groupings, their
grouped values are empty (hinted by the two empty lists output). But of course, <code>groupby</code> wouldn’t
return a group unless there’s <em>at least</em> one item in the corresponding collection. So, what’s going
on?</p>
<p>This is the point I was getting at in the first paragraph of this section. The grouping collections
(the values in the dictionary above) are <em>de facto</em> destroyed once we yield another group. So, if we
wanted to construct a dictionary like this, we need to do something like the following:</p>
<div class="hl"><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span><span>9
</span><span>10
</span><span>11
</span><span>12
</span><span>13
</span><span>14
</span></pre><pre class=content><code><span><span class="kn">import</span> <span class="nn">itertools</span>
</span><span><span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">defaultdict</span>
</span><span><span class="kn">from</span> <span class="nn">pprint</span> <span class="kn">import</span> <span class="n">pprint</span>
</span><span>
</span><span><span class="n">names</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'Arthur'</span><span class="p">,</span> <span class="s1">'Trillian'</span><span class="p">,</span> <span class="s1">'ford'</span><span class="p">,</span> <span class="s1">'zaphod'</span><span class="p">,</span> <span class="s1">'slartibartfast'</span><span class="p">]</span>
</span><span>
</span><span><span class="n">by_casing</span> <span class="o">=</span> <span class="n">defaultdict</span><span class="p">(</span><span class="nb">list</span><span class="p">)</span>
</span><span>
</span><span><span class="k">for</span> <span class="n">is_title</span><span class="p">,</span> <span class="n">group_names</span> <span class="ow">in</span> <span class="n">itertools</span><span class="o">.</span><span class="n">groupby</span><span class="p">(</span><span class="n">names</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="nb">str</span><span class="o">.</span><span class="n">istitle</span><span class="p">):</span>
</span><span> <span class="n">by_casing</span><span class="p">[</span><span class="n">is_title</span><span class="p">]</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">group_names</span><span class="p">)</span>
</span><span>
</span><span><span class="n">pprint</span><span class="p">(</span><span class="nb">dict</span><span class="p">(</span><span class="n">by_casing</span><span class="p">))</span>
</span><span><span class="n">pprint</span><span class="p">(</span><span class="n">by_casing</span><span class="p">[</span><span class="kc">True</span><span class="p">])</span>
</span><span><span class="n">pprint</span><span class="p">(</span><span class="n">by_casing</span><span class="p">[</span><span class="kc">False</span><span class="p">])</span>
</span></code></pre></div>
<p>This would produce the following output:</p>
<div class="hl"><pre class=content><code><span><span class="p">{</span><span class="kc">False</span><span class="p">:</span> <span class="p">[</span><span class="s1">'ford'</span><span class="p">,</span> <span class="s1">'zaphod'</span><span class="p">,</span> <span class="s1">'slartibartfast'</span><span class="p">],</span> <span class="kc">True</span><span class="p">:</span> <span class="p">[</span><span class="s1">'Arthur'</span><span class="p">,</span> <span class="s1">'Trillian'</span><span class="p">]}</span>
</span><span><span class="p">[</span><span class="s1">'Arthur'</span><span class="p">,</span> <span class="s1">'Trillian'</span><span class="p">]</span>
</span><span><span class="p">[</span><span class="s1">'ford'</span><span class="p">,</span> <span class="s1">'zaphod'</span><span class="p">,</span> <span class="s1">'slartibartfast'</span><span class="p">]</span>
</span></code></pre></div>
<p>Just something to keep in mind.</p>
<p class="note">The above snippet of code uses <a href="https://docs.python.org/3/library/collections.html#collections.defaultdict" rel="noopener noreferrer" target="_blank"><code>collections.defaultdict</code></a>. I haven’t written about
this yet, but I intend to, in the near future (most likely within the 21st century).</p>
<h2 id="a-really-bad-diy-implementation">A Really Bad DIY Implementation<a class="headerlink" href="#a-really-bad-diy-implementation" title="Permanent link">¶</a></h2>
<p>Let’s try and create an implementation of our own version of <code>groupby</code>, called <code>insane_grouper</code>. It
should have the following characteristics:</p>
<ol>
<li>Take an iterable, and optionally a key function, interpreting like <code>itertools.groupby</code>.</li>
<li>Group non-contiguous items as a single collections.</li>
<li>Return a dictionary of each group’s key value as the keys and the group’s list of items as the
values.<ul>
<li>This is great since it goes well with our point 2 above. For computing non-contiguous groups,
it is not possible to compute the groups lazily (why? is an exercise for the reader). So,
might as well return a dictionary with all the groups.</li>
</ul>
</li>
</ol>
<p>This might look something like the following:</p>
<div class="hl"><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span><span>9
</span><span>10
</span><span>11
</span><span>12
</span><span>13
</span><span>14
</span><span>15
</span><span>16
</span><span>17
</span></pre><pre class=content><code><span><span class="kn">import</span> <span class="nn">itertools</span>
</span><span><span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">defaultdict</span>
</span><span><span class="kn">from</span> <span class="nn">pprint</span> <span class="kn">import</span> <span class="n">pprint</span>
</span><span>
</span><span><span class="k">def</span> <span class="nf">insane_grouper</span><span class="p">(</span><span class="n">iterable</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
</span><span> <span class="n">groups</span> <span class="o">=</span> <span class="n">defaultdict</span><span class="p">(</span><span class="nb">list</span><span class="p">)</span>
</span><span>
</span><span> <span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">iterable</span><span class="p">:</span>
</span><span> <span class="n">groups</span><span class="p">[</span><span class="n">item</span> <span class="k">if</span> <span class="n">key</span> <span class="ow">is</span> <span class="kc">None</span> <span class="k">else</span> <span class="n">key</span><span class="p">(</span><span class="n">item</span><span class="p">)]</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">item</span><span class="p">)</span>
</span><span>
</span><span> <span class="k">return</span> <span class="nb">dict</span><span class="p">(</span><span class="n">groups</span><span class="p">)</span>
</span><span>
</span><span>
</span><span><span class="n">names</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'Arthur'</span><span class="p">,</span> <span class="s1">'ford'</span><span class="p">,</span> <span class="s1">'zaphod'</span><span class="p">,</span> <span class="s1">'Trillian'</span><span class="p">,</span> <span class="s1">'slartibartfast'</span><span class="p">]</span>
</span><span><span class="n">pprint</span><span class="p">(</span><span class="n">insane_grouper</span><span class="p">(</span><span class="n">names</span><span class="p">,</span> <span class="nb">str</span><span class="o">.</span><span class="n">istitle</span><span class="p">))</span>
</span><span>
</span><span><span class="n">pprint</span><span class="p">(</span><span class="n">insane_grouper</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">),</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span><span class="p">))</span>
</span></code></pre></div>
<p>The output of this snippet is the following:</p>
<div class="hl"><pre class=content><code><span><span class="p">{</span><span class="kc">False</span><span class="p">:</span> <span class="p">[</span><span class="s1">'ford'</span><span class="p">,</span> <span class="s1">'zaphod'</span><span class="p">,</span> <span class="s1">'slartibartfast'</span><span class="p">],</span> <span class="kc">True</span><span class="p">:</span> <span class="p">[</span><span class="s1">'Arthur'</span><span class="p">,</span> <span class="s1">'Trillian'</span><span class="p">]}</span>
</span><span><span class="p">{</span><span class="kc">False</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">9</span><span class="p">],</span> <span class="kc">True</span><span class="p">:</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">8</span><span class="p">]}</span>
</span></code></pre></div>
<h2 id="usage-tips">Usage Tips<a class="headerlink" href="#usage-tips" title="Permanent link">¶</a></h2>
<p>Here’s a few tips and cases where this can be used to quickly compute distinct collections of
objects:</p>
<ol>
<li>A list of dictionaries can be grouped by the value against a particular key present in all (or
some?) of the dictionaries in the list.</li>
<li>The key function can return a tuple. This can be useful where we need to group the items by
multiple criteria, instead of just one.</li>
</ol>
<h2 id="conclusion">Conclusion<a class="headerlink" href="#conclusion" title="Permanent link">¶</a></h2>
<p>While the default behaviour of <code>itertools.groupby</code> may not always be what one expects, it is still
useful. The important point to note is to understand the problem you’re solving, consider the tools
at your disposal and choose the right tool for the job. On that note, I’ll leave you with another
link to the <a href="https://docs.python.org/3/library/itertools.html" rel="noopener noreferrer" target="_blank"><code>itertools</code></a> module.</p>The `tar` Command Clipboard2020-02-02T00:00:00+05:302020-02-02T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2020-02-02:/posts/the-tar-command-clipboard/<p>Recently, while doing an experiment with my blog’s rendered output with a VPS instance, I needed to
transfer it to the server over SSH. While doing that, I experimented with archiving the folder a
bit, so I’m putting the outcome of that experience here, should I need it …</p><p>Recently, while doing an experiment with my blog’s rendered output with a VPS instance, I needed to
transfer it to the server over SSH. While doing that, I experimented with archiving the folder a
bit, so I’m putting the outcome of that experience here, should I need it again in the future.</p>
<p>All notes below assume GNU <code>tar v1.26</code>. More specifically, the output of <code>tar --version | head -1</code>
gives:</p>
<div class="hl"><pre class=content><code><span>tar (GNU tar) 1.26
</span></code></pre></div>
<p>I’m only listing the arguments and use-cases that I think are most frequently used (at least by me)
and the ones I’m most likely to need in the future. Please complement this with a healthy serving of
<code>man tar</code> to keep your sanity.</p>
<p class="note">Check out this neat little tool to help generate often-used <code>tar</code> commands:
<a href="https://cligen.sharats.me" rel="noopener noreferrer" target="_blank">cligen.sharats.me</a>. Thanks!</p>
<div class="toc"><span class="toctitle">Table of Contents</span><ul>
<li><a href="#creating-archives">Creating Archives</a><ul>
<li><a href="#create-a-tarbz2-archive">Create a .tar.bz2 Archive</a></li>
<li><a href="#exclude-git-directory">Exclude .git Directory</a></li>
<li><a href="#set-initial-directory">Set Initial Directory</a></li>
</ul>
</li>
<li><a href="#inspecting-archives">Inspecting Archives</a><ul>
<li><a href="#single-vs-multiple-top-levels">Single vs Multiple Top Levels</a></li>
</ul>
</li>
<li><a href="#extracting-archives">Extracting Archives</a><ul>
<li><a href="#extracting-to-different-directory">Extracting to Different Directory</a></li>
</ul>
</li>
<li><a href="#transferring-archives-directories">Transferring Archives / Directories</a><ul>
<li><a href="#local-to-remote">Local to Remote</a></li>
<li><a href="#remote-to-local">Remote to Local</a></li>
</ul>
</li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
</div>
<h2 id="creating-archives">Creating Archives<a class="headerlink" href="#creating-archives" title="Permanent link">¶</a></h2>
<p>The <code>-c</code> (or <code>--create</code>) command is used to <strong>create</strong> archives.</p>
<p class="note">The <code>-</code> in front of the <code>c</code> can be omitted, but I find that ugly and prefer to include it. That way
it’s consistent with most other such GNU commands.</p>
<p>Additional options after <code>-c</code>:</p>
<ol>
<li>
<p><code>v</code> – Enable verbose output. Adding this will print each file as it is being added to the
archive.</p>
</li>
<li>
<p><code>z</code> or <code>j</code> – Specify the compression format, if needed. Use <code>z</code> for <code>gz</code> archive or <code>j</code> for
<code>bz2</code> archive. This can also be <code>a</code> to infer the compression format from the file name, but only
if the <code>f</code> (explained in the next point) is also given. Other compression formats like <code>--xz</code>,
<code>--lzip</code> etc. can also be used.</p>
</li>
<li>
<p><code>f</code> – Use the next argument as the file name of the archive. If this argument is not provided,
the archive content is written to the standard out.</p>
</li>
<li>
<p><code>--remove-files</code> – Remove files after adding them to the archive. Be careful with this.</p>
</li>
</ol>
<p>To illustrate the examples, I’ll clone one of my public repositories and play around with creating
archives of it.</p>
<div class="hl"><pre class=content><code><span><span class="gp">$ </span>git<span class="w"> </span>clone<span class="w"> </span>git@github.com:sharat87/just-a-calendar.git
</span><span><span class="gp">$ </span>du<span class="w"> </span>-sh<span class="w"> </span>just-a-calendar
</span><span><span class="go">248K just-a-calendar/</span>
</span></code></pre></div>
<h3 id="create-a-tarbz2-archive">Create a <code>.tar.bz2</code> Archive<a class="headerlink" href="#create-a-tarbz2-archive" title="Permanent link">¶</a></h3>
<p>To create a <code>bz2</code> archive of a folder:</p>
<div class="hl"><pre class=content><code><span><span class="gp">$ </span>tar<span class="w"> </span>-cjf<span class="w"> </span>package.tar.bz2<span class="w"> </span>just-a-calendar
</span><span><span class="gp">$ </span>file<span class="w"> </span>package.tar.bz2
</span><span><span class="go">package.tar.bz2: bzip2 compressed data, block size = 900k</span>
</span><span><span class="gp">$ </span>du<span class="w"> </span>-sh<span class="w"> </span>package.tar.bz2
</span><span><span class="go">76K package.tar.bz2</span>
</span></code></pre></div>
<p>Since we are specifying the file name here, which includes the <code>.bz2</code> part at the end, we can tell
<code>tar</code> to just figure out the compression we want to use. Instead of the <code>j</code> argument specifying the
compression, we’d put in <code>a</code> to indicate this.</p>
<div class="hl"><pre class=content><code><span><span class="gp">$ </span>tar<span class="w"> </span>-caf<span class="w"> </span>package.tar.bz2<span class="w"> </span>just-a-calendar
</span><span><span class="gp">$ </span>file<span class="w"> </span>package.tar.bz2
</span><span><span class="go">package.tar.bz2: bzip2 compressed data, block size = 900k</span>
</span><span><span class="gp">$ </span>du<span class="w"> </span>-sh<span class="w"> </span>package.tar.bz2
</span><span><span class="go">76K package.tar.bz2</span>
</span></code></pre></div>
<h3 id="exclude-git-directory">Exclude <code>.git</code> Directory<a class="headerlink" href="#exclude-git-directory" title="Permanent link">¶</a></h3>
<p>Now, the archive also contains the <code>.git</code> directory that was present in our clone. We probably don’t
what that. The <code>tar</code> command provides <code>--exclude*</code> family of arguments to deal with this. For
example, as in our case, to ignore the folder <code>.git</code>, we could do:</p>
<div class="hl"><pre class=content><code><span><span class="gp">$ </span>tar<span class="w"> </span>-caf<span class="w"> </span>package.tar.bz2<span class="w"> </span>--exclude<span class="o">=</span>.git<span class="w"> </span>just-a-calendar
</span><span><span class="gp">$ </span>du<span class="w"> </span>-sh<span class="w"> </span>package.tar.bz2
</span><span><span class="go">12K package.tar.bz2</span>
</span></code></pre></div>
<p>This package doesn’t contain the <code>.git</code> folder (and consequently is <em>much</em> smaller). However, for
this particular problem, there’s perhaps an even better solution, the <code>--exclude-vcs</code> argument. This
argument will ignore any VCS directories automatically and it knows about <code>.git</code>. So our command
becomes:</p>
<div class="hl"><pre class=content><code><span><span class="gp">$ </span>tar<span class="w"> </span>-caf<span class="w"> </span>package.tar.bz2<span class="w"> </span>--exclude-vcs<span class="w"> </span>just-a-calendar
</span></code></pre></div>
<p>Another similar useful argument is the <code>--eclude-backups</code>, which will <strong>exclude backup and lock
files</strong> which also is usually what we want.</p>
<h3 id="set-initial-directory">Set Initial Directory<a class="headerlink" href="#set-initial-directory" title="Permanent link">¶</a></h3>
<p>The <code>-C</code> (or <code>--directory</code>) argument sets the initial working directory before creating the archive.
This will influence the paths with which the files <em>inside</em> the archive are saved with. This is
normally only useful if for some reason you can’t <code>cd</code> or <code>pushd</code> to that directory yourself, which
is not very often.</p>
<h2 id="inspecting-archives">Inspecting Archives<a class="headerlink" href="#inspecting-archives" title="Permanent link">¶</a></h2>
<p>The <code>-t</code> (or <code>--list</code>) can be used to list the contents of an archive without extracting it.</p>
<p>Additional options after <code>-t</code>:</p>
<ol>
<li>
<p><code>v</code> – Verbose listing. The affect of adding this option is like adding <code>-l</code> to the <code>ls</code> command.
That is, it will show each file’s permissions, size, last modified <em>etc.</em> details.</p>
</li>
<li>
<p><code>f</code> – Treat next argument as the archive file name. This argument is <em>usually</em> always needed
with the <code>-t</code> command (unless the archive is being piped in to the <code>tar -t</code> command).</p>
</li>
</ol>
<p>Let’s run this on our package archive created in the previous section.</p>
<div class="hl"><pre class=content><code><span><span class="gp">$ </span>tar<span class="w"> </span>-tf<span class="w"> </span>package.tar.bz2<span class="w"> </span><span class="p">|</span><span class="w"> </span>wc<span class="w"> </span>-l
</span><span><span class="go">6</span>
</span></code></pre></div>
<h3 id="single-vs-multiple-top-levels">Single vs Multiple Top Levels<a class="headerlink" href="#single-vs-multiple-top-levels" title="Permanent link">¶</a></h3>
<p>There’s one thing about extracting archives that’s extremely annoying. If it contains multiple files
at top level, it’ll pollute the current directory with several objects. To combat this, if we make
it a habit to create a new folder and extract inside it, it might turn out that the archive itself
contains a top level directory, so now we end up one useless directory in the tree.</p>
<p>This situation is actually handled very well by the <code>aunpack</code> command from the <a href="https://www.nongnu.org/atool/" rel="noopener noreferrer" target="_blank">atool</a> script.
This command takes an archive (of any of several different formats) and extracts it. If it contains
a single top level entry, it is extracted to your working directory. If it contains several top
level entries, a new directory is created and the extraction happens inside that new directory. This
command is extremely convenient, for this and several other reasons.</p>
<p>To find out if an archive has a single top-level entry or multiple, the following snippet can be
used:</p>
<div class="hl"><pre class=content><code><span>tar<span class="w"> </span>-tf<span class="w"> </span>package.tar.bz2<span class="w"> </span><span class="p">|</span><span class="w"> </span>cut<span class="w"> </span>-d/<span class="w"> </span>-f1<span class="w"> </span><span class="p">|</span><span class="w"> </span>sort<span class="w"> </span>-u
</span></code></pre></div>
<p>This will print out one top-level entry per line. If there’s only one line in the output, then
there’s only one top-level. How this works is that first, the <code>cut</code> command splits the listing with
<code>/</code> character, the file separator and only prints the first entry, which will be the top level
entry. Then, the <code>sort</code> command will sort the top-levels and only print the <strong>unique</strong> entries
(that’s what the <code>-u</code> is for). We could further pipe this to <code>wc -l</code> and check if it results in <code>1</code>.</p>
<h2 id="extracting-archives">Extracting Archives<a class="headerlink" href="#extracting-archives" title="Permanent link">¶</a></h2>
<p>The <code>-x</code> (or <code>--extract</code>) command is used to <strong>extract</strong> the contents of archives.</p>
<p>This command takes the following arguments:</p>
<ol>
<li>
<p><code>v</code> – Verbose logging. Prints each file path as it is being extracted.</p>
</li>
<li>
<p><code>z</code> or <code>j</code> – Specify the compression format, if needed. Similar in working as with the <code>-c</code>
command.</p>
</li>
<li>
<p><code>f</code> – Reads the next argument as the archive file name. This is almost always used with this
command to specify the archive to extract. If this is not provided, the archive content is
expected to be available from standard input.</p>
</li>
<li>
<p><code>k</code> (or <code>--keep-old-files</code>) – Fail if any existing files will be overwritten by extracting. This
is useful if you don’t want any of your existing files to be overwritten.</p>
</li>
</ol>
<p>So, to extract our archive (in a separate location, of course):</p>
<div class="hl"><pre class=content><code><span><span class="gp">$ </span>mkdir<span class="w"> </span>spike<span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nb">cd</span><span class="w"> </span>spike
</span><span><span class="gp">$ </span>tar<span class="w"> </span>-xaf<span class="w"> </span>../package.tar.bz2
</span></code></pre></div>
<h3 id="extracting-to-different-directory">Extracting to Different Directory<a class="headerlink" href="#extracting-to-different-directory" title="Permanent link">¶</a></h3>
<p>The extract command also supports the <code>-C</code> (or <code>--directory</code>) argument that sets the initial working
directory before extracting. This can be used to change the location where the extracted
files/folder will be saved.</p>
<h2 id="transferring-archives-directories">Transferring Archives / Directories<a class="headerlink" href="#transferring-archives-directories" title="Permanent link">¶</a></h2>
<p>In this section, I’ll show a couple of quick examples where we need to transfer a folder tree
between current local system and a remote system reachable by SSH.</p>
<h3 id="local-to-remote">Local to Remote<a class="headerlink" href="#local-to-remote" title="Permanent link">¶</a></h3>
<p>We could create a <code>tar</code> file of the folder (and any other files as well), transfer the file to the
remote system, login to the remote system and unpack it there.</p>
<p>There’s a couple of problems with this approach:</p>
<ol>
<li>Since we are creating an archive of the folder on our local disk, we need to have the necessary
free space for that archive. This may be less the size of the folder, but can still be
significant if the folder is large. The same problem will also appear on the remote system.</li>
<li>We need write permissions on the local disk. If we want to just take a folder to a remote system,
we should only need write permission on the remote disk, not on the local disk.</li>
</ol>
<p>To avoid the above two problems, we can transfer the archive directly as a stream, without saving it
to the local disk. Notice that if we don’t provide a filename for the create (<code>-c</code>) command, the
archive will be written to standard out. Similarly, if we don’t provide a filename for the extract
(<code>-x</code>) command, it will read the archive from standard input. Our solution below will leverage these
two facts.</p>
<div class="hl"><pre class=content><code><span>tar<span class="w"> </span>-cj<span class="w"> </span>just-a-calendar<span class="w"> </span><span class="p">|</span><span class="w"> </span>ssh<span class="w"> </span>remote<span class="w"> </span>tar<span class="w"> </span>-xj
</span></code></pre></div>
<p>The first command (<code>tar -cj just-a-calendar</code>) creates a <code>bzip2</code>-compressed archive (we could’ve used
<code>z</code> here to use <code>gz</code> compression instead) and writes it to the standard out. This becomes the
standard input for the <code>ssh</code> command which will connect to the remote host, invoke the <code>tar -xj</code>
command, and forwards it’s own standard input to that <code>tar -xj</code> command. The <code>tar -xj</code> command
extracts the archive from it’s standard input, using <code>bzip2</code> for decompressing and writes the
extracted contents to the remote user’s home directory.</p>
<p>For added measure, we could use the <code>-C</code> (or <code>--directory</code>) argument to <code>tar -xj</code> to set the
directory where the extracted files would be saved.</p>
<p>This method is extremely handy since the archive is not written to the disk anywhere, not on local,
not on remote. It’s only processed as a stream of bytes.</p>
<p>The <code>-j</code> argument to the <code>tar</code> commands is not strictly necessary. The whole thing will work even
without it. But since the archive is being transferred over network, it pays to spend a little
processor time into compressing it so as to minimize network usage (and consequently, speed up the
operation).</p>
<p>We could’ve added the <code>-v</code> argument to one (or both!?) <code>tar</code> commands to show the files as they are
being archived/extracted.</p>
<h3 id="remote-to-local">Remote to Local<a class="headerlink" href="#remote-to-local" title="Permanent link">¶</a></h3>
<p>This follows a similar method as in the previous section, but in the other way around. We run the
archiver <code>tar</code> command on the remote host, and the extractor <code>tar</code> command on the local machine.</p>
<div class="hl"><pre class=content><code><span>ssh<span class="w"> </span>remote<span class="w"> </span>tar<span class="w"> </span>-cj<span class="w"> </span>just-a-calendar<span class="w"> </span><span class="p">|</span><span class="w"> </span>tar<span class="w"> </span>-xj
</span></code></pre></div>
<p>This will recreate the <code>just-a-calendar</code> folder on the remote host, onto the local disk. We could
use the <code>-C</code> argument to either <code>tar</code> command to set it’s initial working directory.</p>
<p>Of course, if wanted to just save the archive on the local disk, not extract it, we could just
redirect the stream to a file.</p>
<div class="hl"><pre class=content><code><span>ssh remote tar -cj just-a-calendar > package.tar.bz2
</span></code></pre></div>
<h2 id="conclusion">Conclusion<a class="headerlink" href="#conclusion" title="Permanent link">¶</a></h2>
<p>The <code>tar</code> command, in all it’s variations, is irreplaceable in it’s utility for these kind of
purposes. The handiest resource for getting help while working with it is, of course, the man page.
But when we’re in the mood to just copy-pasta (yes, pasta) a command to serve the purpose, I hope
this article will be helpful.</p>Working with Strings in Python2020-01-26T00:00:00+05:302020-01-26T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2020-01-26:/posts/working-with-strings-in-python/<p>This article will be a practical rundown of working with strings in Python, made up of things I
constantly forget and have to look up on how to do. I hope it will serve as a super-quick reference
for me as well as for anybody else who stumbles here.</p>
<p>This …</p><p>This article will be a practical rundown of working with strings in Python, made up of things I
constantly forget and have to look up on how to do. I hope it will serve as a super-quick reference
for me as well as for anybody else who stumbles here.</p>
<p>This document is not intended for beginners to Python. Although you can still get something out of
it, it’s best suited for intermediate Python programmers. I tried to illustrate the concepts in a
crisp manner with minimum carry-over context from one section to the next.</p>
<div class="toc"><span class="toctitle">Table of Contents</span><ul>
<li><a href="#defining-strings">Defining Strings</a><ul>
<li><a href="#single-and-double-quoted-strings">Single and Double Quoted Strings</a></li>
<li><a href="#tripled-quoted-strings">Tripled Quoted Strings</a></li>
<li><a href="#escape-characters">Escape Characters</a></li>
</ul>
</li>
<li><a href="#auto-concatenated-strings">Auto-concatenated Strings</a></li>
<li><a href="#raw-strings">Raw Strings</a></li>
<li><a href="#concatenation">Concatenation</a></li>
<li><a href="#splitting">Splitting</a><ul>
<li><a href="#the-splitlines-method">The .splitlines Method</a></li>
</ul>
</li>
<li><a href="#substring-check">Substring Check</a><ul>
<li><a href="#prefix-and-suffix-check">Prefix and Suffix Check</a></li>
<li><a href="#regular-expressions-check">Regular Expressions Check</a></li>
</ul>
</li>
<li><a href="#learning-about-the-contents">Learning About the Contents</a><ul>
<li><a href="#numeric-checks">Numeric Checks</a></li>
</ul>
</li>
<li><a href="#transformations">Transformations</a></li>
<li><a href="#string-formatting">String Formatting</a></li>
<li><a href="#docstrings">Docstrings</a></li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
</div>
<h2 id="defining-strings">Defining Strings<a class="headerlink" href="#defining-strings" title="Permanent link">¶</a></h2>
<h3 id="single-and-double-quoted-strings">Single and Double Quoted Strings<a class="headerlink" href="#single-and-double-quoted-strings" title="Permanent link">¶</a></h3>
<p>We’ll refer to strings delimited by the <code>'</code> character as single quoted strings and those delimited
by <code>"</code> as double quoted strings.</p>
<p>They are identical in all respects, except that single quote needs to be escaped in single quoted
strings and double quote needs to be escaped in double quoted strings.</p>
<p>They cannot span multiple lines. A string’s ending quote character must appear in the same line as
it begins. This can be worked around by using a <code>\</code> character at the end of the line. For example:</p>
<div class="hl"><pre class=content><code><span><span class="n">text</span> <span class="o">=</span> <span class="s1">'abc</span><span class="se">\</span>
</span><span><span class="s1">def'</span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="n">text</span><span class="p">)</span>
</span></code></pre></div>
<p>This will print:</p>
<div class="hl"><pre class=content><code><span>abc
</span><span>def
</span></code></pre></div>
<p>But it’s best to avoid breaking using <code>\</code> to break strings into multiple lines. It’s not pretty and
there’s better way to do it. Especially <a href="#auto-concatenated-strings">auto-concatenated strings</a>
(discussed below).</p>
<h3 id="tripled-quoted-strings">Tripled Quoted Strings<a class="headerlink" href="#tripled-quoted-strings" title="Permanent link">¶</a></h3>
<p>Tripled quoted strings are a syntax for defining multi-line strings. There’s no practical difference
between defining strings with <code>'''</code> and <code>"""</code>.</p>
<p>In practice, this syntax is commonly used for one of the following:</p>
<ol>
<li><a href="#docstrings">Docstrings</a> (discussed below), for writing documentation for classes/functions.</li>
<li>Module level constant strings that contain long multi-line content. Can be used for small HTML
templates that are stored inline or complex SQL queries, long regular expression patterns etc.</li>
<li>An <em>approximation</em> for multi-line comments. Python doesn’t have multi-line comments (like <code>/*</code>
and <code>*/</code> in C-like languages). Wrapping whole code blocks with tripled quotes can turn it into a
pseudo-comment. I personally discourage this, but it’s nonetheless used in real-world code.</li>
</ol>
<p>The string created when using tripled quoted strings will contain <em>everything</em> between the tripled
quotes. This includes any indentation present due to Python block-style formatting. For example:</p>
<div class="hl"><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span><span>9
</span><span>10
</span><span>11
</span></pre><pre class=content><code><span><span class="k">def</span> <span class="nf">make_story</span><span class="p">():</span>
</span><span> <span class="n">text</span> <span class="o">=</span> <span class="s1">'''</span>
</span><span><span class="s1"> Once upon a time, there was a planet.</span>
</span><span><span class="s1"> Suddenly, it named itself Earth.</span>
</span><span><span class="s1"> And it hoped to live happily ever after.</span>
</span><span><span class="s1"> '''</span>
</span><span>
</span><span> <span class="k">return</span> <span class="n">text</span>
</span><span>
</span><span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="nb">repr</span><span class="p">(</span><span class="n">make_story</span><span class="p">()))</span>
</span></code></pre></div>
<p>This will produce the following output:</p>
<div class="hl"><pre class=content><code><span>'\n Once upon a time, there was a planet.\n Suddenly, it named itself Earth.\n And it hoped to live happily ever after.\n '
</span></code></pre></div>
<p>There’s three things to note in the string defined in this function:</p>
<ol>
<li>It starts with a newline character, the one that comes right after the opening <code>'''</code> on line 2.<ul>
<li>This particular point can be easily addressed by adding a <code>\</code> right after the opening <code>'''</code>.</li>
</ul>
</li>
<li>Each line, except for the first, starts with four spaces, because of the indentation of the
<code>make_story</code> function.<ul>
<li>The <a href="https://docs.python.org/3/library/textwrap.html#textwrap.dedent" rel="noopener noreferrer" target="_blank"><code>textwrap.dedent</code></a> function from standard library can help deal with this.
Details in the next paragraph.</li>
</ul>
</li>
<li>It ends with a newline character and the four spaces from the line 6.<ul>
<li>Calling <code>.strip</code> (or <code>.rstrip</code>) on the string can do this.</li>
</ul>
</li>
</ol>
<p>Considering the above three points, we rewrite the previous code fragment as:</p>
<div class="hl"><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span><span>9
</span><span>10
</span></pre><pre class=content><code><span><span class="kn">import</span> <span class="nn">textwrap</span>
</span><span>
</span><span><span class="k">def</span> <span class="nf">make_story</span><span class="p">():</span>
</span><span> <span class="n">text</span> <span class="o">=</span> <span class="n">textwrap</span><span class="o">.</span><span class="n">dedent</span><span class="p">(</span><span class="s1">'''</span><span class="se">\</span>
</span><span><span class="s1"> Once upon a time, there was a planet.</span>
</span><span><span class="s1"> Suddenly, it named itself Earth.</span>
</span><span><span class="s1"> And it hoped to live happily ever after.</span>
</span><span><span class="s1"> '''</span><span class="o">.</span><span class="n">rstrip</span><span class="p">())</span>
</span><span>
</span><span> <span class="k">return</span> <span class="n">text</span>
</span></code></pre></div>
<p>Note that it is important to use <code>.rstrip</code> here, and not <code>.strip</code>. The reason is that <code>.strip</code> will
remove the whitespace before <code>Once...</code> line and so the first line in the string won’t have any
indentation. Now the documentation of <code>textwrap.dedent</code> says:</p>
<blockquote>
<p>Remove any common leading whitespace from every line in text.</p>
</blockquote>
<p>But since our first line doesn’t have the indentation anymore, there’s no <em>common</em> leading
whitespace in <code>text</code>. So, this function won’t remove the indentation. Another option would be to do
<code>dedent</code> first, and then call <code>.strip</code> on the result of <code>dedent</code>.</p>
<p>The output of this program would be:</p>
<div class="hl"><pre class=content><code><span>'Once upon a time, there was a planet.\nSuddenly, it named itself Earth.\nAnd it hoped to live happily ever after.'
</span></code></pre></div>
<h3 id="escape-characters">Escape Characters<a class="headerlink" href="#escape-characters" title="Permanent link">¶</a></h3>
<p>Backslash based escape characters behave exactly the same way in strings defined with any quote
type.</p>
<p>Following is a list of <em>commonly used</em> escape characters. This list is <strong>not exhaustive</strong>.</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>Escape sequence</th>
<th>Result</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>'\'</code> (at end of line)</td>
<td>String definition is continued to next line</td>
</tr>
<tr>
<td><code>'\n'</code></td>
<td>Newline character</td>
</tr>
<tr>
<td><code>'\\'</code></td>
<td>Literal backslash character</td>
</tr>
<tr>
<td><code>'\''</code></td>
<td>Single quote character, useful in single quoted strings, but works everywhere</td>
</tr>
<tr>
<td><code>"\""</code></td>
<td>Double quote character, useful in double quoted strings, but works everywhere</td>
</tr>
<tr>
<td><code>'\xhh'</code></td>
<td>Character by hex value given by the <code>hh</code> part</td>
</tr>
</tbody>
</table>
</div>
<p>Regarding escaping quote characters:</p>
<ol>
<li>Single quotes don’t <em>have to</em> be escaped in double quote strings, but it’s not an error to do so.</li>
<li>Double quotes don’t <em>have to</em> be escaped in single quote strings, but it’s not an error to do so.</li>
<li>Neither quotes <em>have to</em> be escaped in tripled quote strings, but it’s not an error to do so.</li>
</ol>
<p>In tripled quote strings, the delimiters <strong>cannot be escaped</strong> to become part of the string. For
example, a <code>'''</code> sequence cannot be part of the string when the string is defined with <code>'''</code>. But it
may be part of the string, when it’s defined with <code>"</code> or <code>"""</code>. This behaviour cannot be escaped.</p>
<h2 id="auto-concatenated-strings">Auto-concatenated Strings<a class="headerlink" href="#auto-concatenated-strings" title="Permanent link">¶</a></h2>
<p>Python has a nice compiler level feature to auto-concatenate <em>literal</em> strings that are next to each
other (or more correctly, forming a single expressions). Take a look at an example to illustrate the
point:</p>
<div class="hl"><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span></pre><pre class=content><code><span><span class="n">query</span> <span class="o">=</span> <span class="p">(</span>
</span><span> <span class="s1">'SELECT * FROM employees'</span>
</span><span> <span class="s1">' WHERE name = ?'</span>
</span><span><span class="p">)</span>
</span><span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="n">query</span><span class="p">)</span>
</span></code></pre></div>
<p>The string <code>query</code> is defined as two parts, each on lines 2 and 3. These two strings will be
concatenated automatically at compile-time. The output of the above program would be:</p>
<div class="hl"><pre class=content><code><span>SELECT * FROM employees WHERE name = ?
</span></code></pre></div>
<p>Things to note regarding this behaviour:</p>
<ol>
<li>The strings don’t have any operator between them, like <code>+</code> or <code>,</code> or something else.</li>
<li>This works only with <em>string literals</em>, it won’t work when applied to variables.</li>
<li>This is a compile-time feature, and so is more performant than string concatenation using the <code>+</code>
operator.</li>
<li>The multiple string literals should be part of the same expression. So, if we are writing them on
multiple lines, they have to wrapped in parentheses or we should use the <code>\</code> character to tell
Python to treat multiple lines as a single expression.</li>
<li>Works with combinations of ordinary strings, raw strings, format strings and any combinations of
them together.</li>
</ol>
<p>Thanks to this feature, there’s almost never a reason to define long string constants by
concatenating several strings.</p>
<h2 id="raw-strings">Raw Strings<a class="headerlink" href="#raw-strings" title="Permanent link">¶</a></h2>
<p>Python’s raw strings’ syntax is a small variation that disables the escaping behaviour of the <code>\</code>
character. A string is treated as a raw string if the starting delimiter quote is prefixed with a
<code>r</code> (or <code>R</code>) character.</p>
<p>The following expressions create <em>equal</em> (as defined by <code>==</code> operator) string:</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>Unadorned string</th>
<th>Raw string</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>'abc'</code></td>
<td><code>r'abc'</code></td>
</tr>
<tr>
<td><code>'abc\ndef'</code></td>
<td><em>not possible</em></td>
</tr>
<tr>
<td><code>'abc\\ndef'</code></td>
<td><code>r'abc\ndef'</code></td>
</tr>
</tbody>
</table>
</div>
<p>In other words, the special escaping behaviour of <code>\</code> character cannot be used in raw strings. This
is useful when you have a lot of <code>\\</code> in your unadorned string. Such a string’s definition can be
much simpler if using raw strings.</p>
<p>Points to note regarding raw strings:</p>
<ol>
<li>Can be used with single, double or tripled quotes.</li>
<li>The actual string object created is no different from the one when using unadorned string syntax.
It is just a syntax-level convenience.</li>
<li>Delimiter quotes cannot be included in raw strings. In other words, single quotes cannot be a
part of raw single quote strings. For example, <code>r'abc\'def'</code> gives the string <code>"abc\\'def"</code>. That
is, the string will contain one backslash, and one single quote, essentially it will be exactly
as it looks like in the definition.</li>
<li>Cannot be defined to end with a single <code>\</code>. The expression <code>r'abc\'</code> will raise a <code>SyntaxError</code>.
The expression <code>r'abc\\'</code> will end with two backslash characters.</li>
</ol>
<p>The limitations above can be worked around by using raw and ordinary strings together.</p>
<p>Most commonly useful scenarios for raw strings:</p>
<ol>
<li>Regular expression patterns, to be used with the <a href="https://docs.python.org/3/library/re.html" rel="noopener noreferrer" target="_blank"><code>re</code></a> module.</li>
<li>Windows style file paths, where the separator is the backslash character. Note that the <code>open</code>
function works fine even with forward slashes on Windows, so this is <em>generally</em> not needed.</li>
<li>SQL queries, especially when defined with tripled quotes as module level constants.</li>
</ol>
<h2 id="concatenation">Concatenation<a class="headerlink" href="#concatenation" title="Permanent link">¶</a></h2>
<p>The <code>+</code> operator can be used to concatenate two strings. This will create a new string object which
is the result of the concatenation (<code>str</code> objects are immutable in Python).</p>
<p>If there’s several strings being concatenated, using the <code>+</code> operator may not be the best way to do
this. For example, consider the following snippet of code:</p>
<div class="hl"><pre class=content><code><span><span class="n">text</span> <span class="o">=</span> <span class="s1">''</span>
</span><span>
</span><span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">):</span>
</span><span> <span class="n">text</span> <span class="o">+=</span> <span class="s1">'we have </span><span class="si">%r</span><span class="se">\n</span><span class="s1">'</span> <span class="o">%</span> <span class="n">i</span>
</span><span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="n">text</span><span class="p">)</span>
</span></code></pre></div>
<p>When run, it produces the following output:</p>
<div class="hl"><pre class=content><code><span>we have 0
</span><span>we have 1
</span><span>we have 2
</span><span>we have 3
</span></code></pre></div>
<p>However, using the <code>+</code> operator here means that intermediate string objects are created at every
concatenation operation. This is needless memory allocation since these intermediate string objects
are never used, and are ready for garbage collection rather quickly. For situations like this,
there’s better options than concatenating strings using <code>+</code> operator.</p>
<p><strong>One option</strong> is to use a list and then pass it to <code>''.join</code> method to concatenate them all in one
go. Using this option in the above code snippet, we get:</p>
<div class="hl"><pre class=content><code><span><span class="n">fragments</span> <span class="o">=</span> <span class="p">[]</span>
</span><span>
</span><span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">):</span>
</span><span> <span class="n">fragments</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="s1">'we have </span><span class="si">%r</span><span class="se">\n</span><span class="s1">'</span> <span class="o">%</span> <span class="n">i</span><span class="p">)</span>
</span><span>
</span><span><span class="n">text</span> <span class="o">=</span> <span class="s1">''</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">fragments</span><span class="p">)</span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="n">text</span><span class="p">)</span>
</span></code></pre></div>
<p>Additionally, in this case, we could’ve used <code>'\n'.join</code> instead and avoid the trailing newline in
<code>text</code> (<em>if</em> that’s what is desired, don’t do it just because we can).</p>
<div class="hl"><pre class=content><code><span><span class="n">lines</span> <span class="o">=</span> <span class="p">[]</span>
</span><span>
</span><span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">):</span>
</span><span> <span class="n">lines</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="s1">'we have </span><span class="si">%r</span><span class="s1">'</span> <span class="o">%</span> <span class="n">i</span><span class="p">)</span>
</span><span>
</span><span><span class="n">text</span> <span class="o">=</span> <span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">lines</span><span class="p">)</span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="n">text</span><span class="p">)</span>
</span></code></pre></div>
<p><strong>Another option</strong> is to use <a href="https://docs.python.org/3/library/io.html#io.StringIO" rel="noopener noreferrer" target="_blank"><code>io.StringIO</code></a> which is a file-like, in-memory, string
buffer that you can <code>.write</code> string content to and then turn it into a single string object when
done. Rewriting the above code snippet to use this option:</p>
<div class="hl"><pre class=content><code><span><span class="kn">import</span> <span class="nn">io</span>
</span><span>
</span><span><span class="n">buffer</span> <span class="o">=</span> <span class="n">io</span><span class="o">.</span><span class="n">StringIO</span><span class="p">()</span>
</span><span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">):</span>
</span><span> <span class="n">buffer</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'we have </span><span class="si">%r</span><span class="se">\n</span><span class="s1">'</span> <span class="o">%</span> <span class="n">i</span><span class="p">)</span>
</span><span><span class="n">text</span> <span class="o">=</span> <span class="n">buffer</span><span class="o">.</span><span class="n">getvalue</span><span class="p">()</span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="n">text</span><span class="p">)</span>
</span></code></pre></div>
<p>Both solutions are better than concatenating strings with <code>+</code> operator, but if you’re just
concatenating two or three strings, it’s probably simpler to just use <code>+</code> and move on. Premature
optimisation is the root of all evil.</p>
<h2 id="splitting">Splitting<a class="headerlink" href="#splitting" title="Permanent link">¶</a></h2>
<p>Python strings have the <a href="https://docs.python.org/3.8/library/stdtypes.html#str.split" rel="noopener noreferrer" target="_blank"><code>.split</code></a> method that can be used to split strings into list of
tokens or parts. There’s three things to this method to understand:</p>
<p><strong>First</strong>, it takes a separator argument, which can be a string of any length.</p>
<div class="hl"><pre class=content><code><span><span class="nb">print</span><span class="p">(</span><span class="s1">'a,b,c,d'</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">','</span><span class="p">))</span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="s1">'a,b;c,d'</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">';'</span><span class="p">))</span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="s1">'a b c d'</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">' '</span><span class="p">))</span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="s1">'a,,b,,,'</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">','</span><span class="p">))</span>
</span></code></pre></div>
<p>This will produce the following output:</p>
<div class="hl"><pre class=content><code><span><span class="p">[</span><span class="s1">'a'</span><span class="p">,</span> <span class="s1">'b'</span><span class="p">,</span> <span class="s1">'c'</span><span class="p">,</span> <span class="s1">'d'</span><span class="p">]</span>
</span><span><span class="p">[</span><span class="s1">'a,b'</span><span class="p">,</span> <span class="s1">'c,d'</span><span class="p">]</span>
</span><span><span class="p">[</span><span class="s1">'a'</span><span class="p">,</span> <span class="s1">'b'</span><span class="p">,</span> <span class="s1">'c'</span><span class="p">,</span> <span class="s1">'d'</span><span class="p">]</span>
</span><span><span class="p">[</span><span class="s1">'a'</span><span class="p">,</span> <span class="s1">''</span><span class="p">,</span> <span class="s1">'b'</span><span class="p">,</span> <span class="s1">''</span><span class="p">,</span> <span class="s1">''</span><span class="p">,</span> <span class="s1">''</span><span class="p">]</span>
</span></code></pre></div>
<p>Note that adjoining separators will produce empty strings in the returned list.</p>
<p><strong>Second</strong>, not passing a value for the separator (or passing <code>None</code>) will split the string over
<em>whitespace</em>. Note that this is not the same as splitting with the space character (<code>' '</code>). Consider
the following examples:</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>Expression</th>
<th>Result</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>'a b c'.split()</code></td>
<td><code>['a', 'b', 'c']</code></td>
</tr>
<tr>
<td><code>'a b c'.split()</code></td>
<td><code>['a', 'b', 'c']</code></td>
</tr>
<tr>
<td><code>'a\tb\nc'.split()</code></td>
<td><code>['a', 'b', 'c']</code></td>
</tr>
<tr>
<td><code>'a b c '.split()</code></td>
<td><code>['a', 'b', 'c', '']</code></td>
</tr>
<tr>
<td><code>'a b c '.strip().split()</code></td>
<td><code>['a', 'b', 'c']</code></td>
</tr>
</tbody>
</table>
</div>
<p>If you’re familiar with regular expressions, then this splitting over whitespace is similar to
splitting over non-overlapping matches of the pattern <code>\s+</code>.</p>
<p><strong>Third</strong>, there is a second argument, which is the maximum number of times the string will be cut
with the given separator (or whitespace). Thus, if we give <code>1</code> in the second argument, the result
string will contain <em>at most</em> two elements. Of course, not providing any second argument will mean
the string will be split at all occurrences of the separator.</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>Expression</th>
<th>Result</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>'a,b,c,d'.split(',', 2)</code></td>
<td><code>['a', 'b', 'c,d']</code></td>
</tr>
<tr>
<td><code>'a,b,c,d'.split(',', 10)</code></td>
<td><code>['a', 'b', 'c', 'd']</code></td>
</tr>
<tr>
<td><code>'hello'.split(',', 10)</code></td>
<td><code>['hello']</code></td>
</tr>
<tr>
<td><code>'a b c'.split(maxsplit=1)</code></td>
<td><code>['a', 'b c']</code></td>
</tr>
</tbody>
</table>
</div>
<h3 id="the-splitlines-method">The <code>.splitlines</code> Method<a class="headerlink" href="#the-splitlines-method" title="Permanent link">¶</a></h3>
<p>The <a href="https://docs.python.org/3.8/library/stdtypes.html#str.splitlines" rel="noopener noreferrer" target="_blank"><code>.splitlines</code></a> method splits the strings into a list of lines. This method is a
better version of just doing <code>.split('\n')</code> since it handles many of the nasty end-of-line
differences. For example, if your string contains <code>'\r\n'</code> at the end of each line, then doing a
<code>.split('\n')</code> will leave dangling <code>'\r'</code> characters at end of each line. This is handled well by
the <code>.splitlines</code> method. The <a href="https://docs.python.org/3.8/library/stdtypes.html#str.splitlines" rel="noopener noreferrer" target="_blank">official documentation</a> has a list of separators
this method splits by, which I won’t repeat here.</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>Expression</th>
<th>Result</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>'a\nb\rc\r\nd'.splitlines()</code></td>
<td><code>['a', 'b', 'c', 'd']</code></td>
</tr>
<tr>
<td><code>'a b\rc\r\nd'.splitlines()</code></td>
<td><code>['a b', 'c', 'd']</code></td>
</tr>
</tbody>
</table>
</div>
<h2 id="substring-check">Substring Check<a class="headerlink" href="#substring-check" title="Permanent link">¶</a></h2>
<p>To check if a string is wholly contained in another string, the <code>in</code> operator should be used. Note
that this operator is case-sensitive. If case-insensitivity is needed, the easiest option is to just
call <code>.casefold</code> (which is especially designed for this purpose) on both the strings.</p>
<div class="hl"><pre class=content><code><span><span class="n">needle</span> <span class="o">=</span> <span class="s1">'back'</span>
</span><span><span class="n">haystack</span> <span class="o">=</span> <span class="s1">'Going back and forth all the time.'</span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="n">needle</span> <span class="ow">in</span> <span class="n">haystack</span><span class="p">)</span>
</span></code></pre></div>
<p>This would print <code>True</code>, since the string <code>'back'</code> occurs in <code>haystack</code>. Note the intent here, for
example, consider the following example:</p>
<div class="hl"><pre class=content><code><span><span class="n">needle</span> <span class="o">=</span> <span class="s1">'back'</span>
</span><span><span class="n">haystack</span> <span class="o">=</span> <span class="s1">'Forwards is easier than backwards.'</span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="n">needle</span> <span class="ow">in</span> <span class="n">haystack</span><span class="p">)</span>
</span></code></pre></div>
<p>This would again print <code>True</code>, but the intent seems to be to look for the <em>word “back”</em>. In that
case, we’d expect <code>False</code> here and <code>True</code> in the previous example (since <em>back</em> is not a separate
work in the second example). Here again, a simple solution is to call <code>.split</code> on the <code>haystack</code>
string before the <code>in</code> operator check. The idea is that we’d get a list of words out of <code>haystack</code>
and we check if needle occurs in the list.</p>
<div class="hl"><pre class=content><code><span><span class="n">needle</span> <span class="o">=</span> <span class="s1">'back'</span>
</span><span><span class="n">haystack</span> <span class="o">=</span> <span class="s1">'Forwards is easier than backwards.'</span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="n">needle</span> <span class="ow">in</span> <span class="n">haystack</span><span class="o">.</span><span class="n">split</span><span class="p">())</span>
</span></code></pre></div>
<p>This prints out <code>False</code>. This isn’t anywhere near a foolproof word searching system, but does get
you a step ahead.</p>
<h3 id="prefix-and-suffix-check">Prefix and Suffix Check<a class="headerlink" href="#prefix-and-suffix-check" title="Permanent link">¶</a></h3>
<p>We have the <code>.startswith</code> and <code>.endswith</code> methods on strings if we want to check if a string is not
just <em>in</em> another string, but more specifically, if it starts/ends with it.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="s1">'the'</span> <span class="ow">in</span> <span class="s1">'Hello there'</span>
</span><span><span class="go">True</span>
</span><span><span class="gp">>>> </span><span class="s1">'Hello there'</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s1">'he'</span><span class="p">)</span>
</span><span><span class="go">False</span>
</span><span><span class="gp">>>> </span><span class="s1">'Hello there'</span><span class="o">.</span><span class="n">endswith</span><span class="p">(</span><span class="s1">'ere'</span><span class="p">)</span>
</span><span><span class="go">True</span>
</span><span><span class="gp">>>> </span><span class="s1">'Hello there'</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s1">'he'</span><span class="p">)</span>
</span><span><span class="go">True</span>
</span></code></pre></div>
<p>Additionally, there’s a useful twist to these two functions. Instead of a single string as argument,
they can accept a <code>tuple</code> of strings where it check if the original strings starts/ends with <strong>any</strong>
of the strings in the tuple. Check out the following examples:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="s1">'Hello there'</span><span class="o">.</span><span class="n">startswith</span><span class="p">((</span><span class="s1">'He'</span><span class="p">,</span> <span class="s1">'he'</span><span class="p">))</span>
</span><span><span class="go">True</span>
</span><span><span class="gp">>>> </span><span class="s1">'hello there'</span><span class="o">.</span><span class="n">startswith</span><span class="p">((</span><span class="s1">'garbage from outer space'</span><span class="p">,</span> <span class="s1">'He'</span><span class="p">,</span> <span class="s1">'he'</span><span class="p">))</span>
</span><span><span class="go">True</span>
</span></code></pre></div>
<p>A less obvious fact here is that the original string <em>may</em> be shorter than the string being passed
to <code>.startswith</code>/<code>.endswith</code>. This sounds like a nobrainer, but there’s one scenario where it’s
particularly nice.</p>
<p>Consider a situation where we want to check if the first character of a string is, say, <code>'A'</code>. One
option to do this is <code>haystack[0] == 'A'</code>. But this runs the risk that if the <code>haystack = ''</code>, then
<code>haystack[0]</code> will raise an <code>IndexError</code>, where we just wanted <code>False</code>. If we did
<code>haystack.startswith('A')</code>, we’d get <code>False</code> if haystack is empty.</p>
<h3 id="regular-expressions-check">Regular Expressions Check<a class="headerlink" href="#regular-expressions-check" title="Permanent link">¶</a></h3>
<p>Regular expressions are a much larger topic than can be fit under a third level header (may be a
future article). So we’ll just cover the substring checking part using regular expressions (in
obviously limited scope).</p>
<p>All regex (regular expression) operations in Python start from the <code>re</code> module. There’s no special
syntax for defining regex patterns like there is in JavaScript. Patterns are instead written as
strings and the <code>re</code> module knows to interpret them as regex patterns.</p>
<p>For our purpose of substring checking, the <code>re</code> module provides the <a href="https://docs.python.org/3/library/re.html#re.search" rel="noopener noreferrer" target="_blank"><code>.search</code></a> function
that takes a regex pattern, the haystack string and optionally, any flags for the pattern.</p>
<div class="hl"><pre class=content><code><span><span class="kn">import</span> <span class="nn">re</span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s1">'the'</span><span class="p">,</span> <span class="s1">'Hello there'</span><span class="p">))</span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s1">'he'</span><span class="p">,</span> <span class="s1">'Hello there'</span><span class="p">))</span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s1">'he'</span><span class="p">,</span> <span class="s1">'Hello there'</span><span class="p">,</span> <span class="n">flags</span><span class="o">=</span><span class="n">re</span><span class="o">.</span><span class="n">IGNORECASE</span><span class="p">))</span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s1">'hola'</span><span class="p">,</span> <span class="s1">'Hello there'</span><span class="p">))</span>
</span></code></pre></div>
<p>This would produce the following output:</p>
<div class="hl"><pre class=content><code><span><re.Match object; span=(6, 9), match='the'>
</span><span><re.Match object; span=(7, 9), match='he'>
</span><span><re.Match object; span=(0, 2), match='He'>
</span><span>None
</span></code></pre></div>
<p>A minor point to note here is that the return value is not of <em>boolean</em> type. We get an
<a href="https://docs.python.org/3/library/re.html#match-objects" rel="noopener noreferrer" target="_blank"><code>re.Match</code></a> object if there is a <em>successful match</em>, else we get <code>None</code>. This is usually
a minor concern, because the match objects are <em>truth-y</em> and <code>None</code> is <em>false-y</em>. So, we can pretend
it returns a <em>boolean</em> value if we need to.</p>
<p>When using the <code>re.search</code> function this way, the <a href="https://docs.python.org/3/library/re.html#re.escape" rel="noopener noreferrer" target="_blank"><code>re.escape</code></a> function might also come
in handy. This function will escape any special characters in the give string. Special here means
having special behaviour in the context of being a regex pattern.</p>
<p>For example, if the needle is user input and we want to search our haystack such that the needle is
at the end of an English sentence, we’d do something like:</p>
<div class="hl"><pre class=content><code><span><span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="n">needle</span> <span class="o">+</span> <span class="s1">'[.!?:]'</span><span class="p">,</span> <span class="n">haystack</span><span class="p">)</span>
</span></code></pre></div>
<p>But this runs the risk of <code>needle</code> having regex special characters like <code>.*</code> and that would match
everything, which is <em>probably</em> not what we want. In this case, it’s best to wrap the <code>needle</code> in
<code>re.escape</code> and <em>then</em> concatenate the pattern with end-of-sentence markers.</p>
<div class="hl"><pre class=content><code><span><span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="n">re</span><span class="o">.</span><span class="n">escape</span><span class="p">(</span><span class="n">needle</span><span class="p">)</span> <span class="o">+</span> <span class="s1">'[.!?:]'</span><span class="p">,</span> <span class="n">haystack</span><span class="p">)</span>
</span></code></pre></div>
<p>As always, please think twice before using regular expressions to solve a problem, and if you do, if
the pattern is longer than five or six characters, please make use of <code>re.VERBOSE</code> and add comments
to your pattern. You’ll thank yourself later.</p>
<h2 id="learning-about-the-contents">Learning About the Contents<a class="headerlink" href="#learning-about-the-contents" title="Permanent link">¶</a></h2>
<p>Python’s strings have some nice methods to quickly check some facts about it’s contents. Here’s a
rundown of such methods:</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>Method</th>
<th>Returns <code>True</code> if</th>
<th>On empty string</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://docs.python.org/3/library/stdtypes.html#str.isalnum" rel="noopener noreferrer" target="_blank"><code>isalnum</code></a></td>
<td>all characters are alphanumeric</td>
<td><code>False</code></td>
</tr>
<tr>
<td><a href="https://docs.python.org/3/library/stdtypes.html#str.isalpha" rel="noopener noreferrer" target="_blank"><code>isalpha</code></a></td>
<td>all characters are alphabetic</td>
<td><code>False</code></td>
</tr>
<tr>
<td><a href="https://docs.python.org/3/library/stdtypes.html#str.isascii" rel="noopener noreferrer" target="_blank"><code>isascii</code></a></td>
<td>all characters are within ASCII range</td>
<td><code>True</code></td>
</tr>
<tr>
<td><a href="https://docs.python.org/3/library/stdtypes.html#str.isdecimal" rel="noopener noreferrer" target="_blank"><code>isdecimal</code></a></td>
<td>all characters are decimal characters</td>
<td><code>False</code></td>
</tr>
<tr>
<td><a href="https://docs.python.org/3/library/stdtypes.html#str.isdigit" rel="noopener noreferrer" target="_blank"><code>isdigit</code></a></td>
<td>all characters are digits</td>
<td><code>False</code></td>
</tr>
<tr>
<td><a href="https://docs.python.org/3/library/stdtypes.html#str.isidentifier" rel="noopener noreferrer" target="_blank"><code>isidentifier</code></a></td>
<td>string can be a valid Python identifier</td>
<td><code>False</code></td>
</tr>
<tr>
<td><a href="https://docs.python.org/3/library/stdtypes.html#str.islower" rel="noopener noreferrer" target="_blank"><code>islower</code></a></td>
<td>has at least one <em>cased</em> character and they are all in lower case</td>
<td><code>False</code></td>
</tr>
<tr>
<td><a href="https://docs.python.org/3/library/stdtypes.html#str.isnumeric" rel="noopener noreferrer" target="_blank"><code>isnumeric</code></a></td>
<td>all characters are numeric characters</td>
<td><code>False</code></td>
</tr>
<tr>
<td><a href="https://docs.python.org/3/library/stdtypes.html#str.isprintable" rel="noopener noreferrer" target="_blank"><code>isprintable</code></a></td>
<td>all characters are printable</td>
<td><code>True</code></td>
</tr>
<tr>
<td><a href="https://docs.python.org/3/library/stdtypes.html#str.isspace" rel="noopener noreferrer" target="_blank"><code>isspace</code></a></td>
<td>all characters are whitespace</td>
<td><code>False</code></td>
</tr>
<tr>
<td><a href="https://docs.python.org/3/library/stdtypes.html#str.istitle" rel="noopener noreferrer" target="_blank"><code>istitle</code></a></td>
<td>string is title-cased, <em>i.e.,</em> all words start with an upper case character</td>
<td><code>False</code></td>
</tr>
<tr>
<td><a href="https://docs.python.org/3/library/stdtypes.html#str.isupper" rel="noopener noreferrer" target="_blank"><code>isupper</code></a></td>
<td>has at least one <em>cased</em> character and they are all in upper case</td>
<td><code>False</code></td>
</tr>
</tbody>
</table>
</div>
<p>Please use the links to official documentation in the above table to learn more about them. I won’t
be repeating those details here.</p>
<h3 id="numeric-checks">Numeric Checks<a class="headerlink" href="#numeric-checks" title="Permanent link">¶</a></h3>
<p>You might’ve noticed that we have three different methods that all sound awfully similar to each
other: <code>isdecimal</code>, <code>isdigit</code> and <code>isnumeric</code>. The official documentation regarding the difference
between these three wasn’t very helpful for me so I’ll try explain it here.</p>
<p><strong>Firstly</strong>, <code>isdecimal</code> will consider any character that can be used to build a number in the
10-decimal system as <code>True</code>. That means it will give <code>True</code> for the <code>0</code> through <code>9</code> digits.
Additionally, it will also give <code>True</code> for characters that can be used for similar purpose in <em>other
languages</em>. For example, the numbers from Unicode range 3174 to 3183 are of a south Indian language
called Telugu (my mother tongue). The <code>isdecimal</code> method returns <code>True</code> for these characters as
well. However, note that it is not true for Roman numerals since they can’t <em>technically</em> be used to
construct 10-decimal numbers.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="c1"># Arabic Numbers</span>
</span><span><span class="gp">>>> </span><span class="s1">''</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="nb">chr</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">48</span><span class="p">,</span> <span class="mi">58</span><span class="p">))</span>
</span><span><span class="go">'0123456789'</span>
</span><span><span class="gp">>>> </span><span class="n">_</span><span class="o">.</span><span class="n">isdecimal</span><span class="p">()</span>
</span><span><span class="go">True</span>
</span><span><span class="gp">>>></span>
</span><span><span class="gp">>>> </span><span class="c1"># Telugu Numbers</span>
</span><span><span class="gp">>>> </span><span class="s1">''</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="nb">chr</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">3174</span><span class="p">,</span> <span class="mi">3184</span><span class="p">))</span>
</span><span><span class="go">'౦౧౨౩౪౫౬౭౮౯'</span>
</span><span><span class="gp">>>> </span><span class="n">_</span><span class="o">.</span><span class="n">isdecimal</span><span class="p">()</span>
</span><span><span class="go">True</span>
</span></code></pre></div>
<p><strong>Secondly</strong>, <code>isdigit</code> gives <code>True</code> for any character that <em>looks like</em> a <strong>digit</strong>, of any
language. So, this includes any character that is <code>True</code>-ed by <code>isdecimal</code>. Additionally, this
includes characters like <code>¹</code>, <code>²</code>, <code>³</code>, <em>etc.,</em> as well as <code>①</code>, <code>②</code>, <code>③</code>. Notice that fraction
characters are not considered as <em>digits</em>.</p>
<p><strong>Thirdly</strong>, <code>isnumeric</code> gives <code>True</code> for any character that is <em>numeric</em> in nature. So, this
includes any character that is <code>True</code>-ed by <code>isdigit</code>. Additionally, this will give <code>True</code> for
fraction characters such as <code>¼</code>, <code>½</code>, <code>¾</code> <em>etc.</em>, as well as Roman numbers such as <code>Ⅰ</code>, <code>Ⅱ</code>, <code>Ⅲ</code>,
<code>Ⅳ</code>, even <code>Ⅹ</code>, <code>Ⅼ</code>, <code>Ⅽ</code>, <code>Ⅾ</code>, <code>Ⅿ</code> (these are not ordinary alphabets, they are Unicode Roman number
characters) <em>etc.</em></p>
<p>This follows a neat fact regarding the character sets <code>True</code>-ed by the three methods: <code>isdecimal</code>
⊂ <code>isdigit</code> ⊂ <code>isnumeric</code>.</p>
<h2 id="transformations">Transformations<a class="headerlink" href="#transformations" title="Permanent link">¶</a></h2>
<p>This section is about methods that return a new string, which is the result of some transformation
applied to the original string. Since strings in Python are immutable, transformations always return
a new string object. The original string is, always, obviously, left untouched.</p>
<p>Here’s a few commonly used transformation methods (this list is intentionally non-exhaustive):</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>Method</th>
<th>Transformation</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://docs.python.org/3/library/stdtypes.html#str.rstrip" rel="noopener noreferrer" target="_blank"><code>.strip</code></a></td>
<td>Strips <em>whitespace</em> (or characters from the string in first argument) at the start and end of the string.</td>
</tr>
<tr>
<td><a href="https://docs.python.org/3/library/stdtypes.html#str.lstrip" rel="noopener noreferrer" target="_blank"><code>.lstrip</code></a></td>
<td>Strips <em>whitespace</em> (or characters from the string in first argument) only at the start of the string.</td>
</tr>
<tr>
<td><a href="https://docs.python.org/3/library/stdtypes.html#str.rstrip" rel="noopener noreferrer" target="_blank"><code>.rstrip</code></a></td>
<td>Strips <em>whitespace</em> (or characters from the string in first argument) only at the end of the string.</td>
</tr>
<tr>
<td><a href="https://docs.python.org/3/library/stdtypes.html#str.lower" rel="noopener noreferrer" target="_blank"><code>.lower</code></a></td>
<td>All cased characters are converted to lower case, unless they are already in lower case.</td>
</tr>
<tr>
<td><a href="https://docs.python.org/3/library/stdtypes.html#str.upper" rel="noopener noreferrer" target="_blank"><code>.upper</code></a></td>
<td>All cased characters are converted to upper case, unless they are already in upper case.</td>
</tr>
<tr>
<td><a href="https://docs.python.org/3/library/stdtypes.html#str.capitalize" rel="noopener noreferrer" target="_blank"><code>.capitalize</code></a></td>
<td>The first letter is upper-cased and the rest are lower-cased.</td>
</tr>
<tr>
<td><a href="https://docs.python.org/3/library/stdtypes.html#str.title" rel="noopener noreferrer" target="_blank"><code>.title</code></a></td>
<td>The first letter in each word in the string is upper-cased, <em>and</em> all others are converted to lower-cased.</td>
</tr>
</tbody>
</table>
</div>
<p>Please use the links to official documentation in the above table to learn more about them. I won’t
be repeating those details here. The official documentation refers to more methods on strings that I
suggest skimming over. I happened to reinvent the wheel with transforming strings because I didn’t
know Python already provided a method for what I needed.</p>
<h2 id="string-formatting">String Formatting<a class="headerlink" href="#string-formatting" title="Permanent link">¶</a></h2>
<p>String formatting in Python comes majorly in two flavors. <strong>First</strong> is the (now old) <code>printf</code>-style
formatting that uses typed control characters prefixed with <code>%</code>, similar to the <code>printf</code> (more like
<code>sprintf</code>) function in C. <strong>Second</strong> is the new <code>format</code> builtin function and the accompanying
<code>str.format</code> method that is more suited to Python’s dynamic typing, and arguably, is much easier to
use.</p>
<p>Python’s formatting capabilities are quite vast and powerful, warranting a whole separate article. I
intend to do that some time in the coming weeks. Until then, the official documentation on
<a href="https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting" rel="noopener noreferrer" target="_blank"><code>printf</code>-style
formatting</a> and the
<a href="https://docs.python.org/3/library/functions.html#format" rel="noopener noreferrer" target="_blank">format function</a> should serve you well.</p>
<h2 id="docstrings">Docstrings<a class="headerlink" href="#docstrings" title="Permanent link">¶</a></h2>
<p>Docstrings are strings that serve as documentation for Python’s modules, functions and classes.
There’s nothing special in the syntax of these strings per se, but their uniqueness is more due to
where they are positioned in a Python program.</p>
<p>Consider the following function with a docstring on line 2</p>
<div class="hl"><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span></pre><pre class=content><code><span><span class="k">def</span> <span class="nf">triple</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
</span><span><span class="w"> </span><span class="sd">"""Triples the given number and returns the result."""</span>
</span><span> <span class="k">return</span> <span class="n">n</span> <span class="o">*</span> <span class="mi">3</span>
</span><span>
</span><span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="n">triple</span><span class="p">(</span><span class="mi">4</span><span class="p">))</span>
</span></code></pre></div>
<p>The string defined on line 2 in this program is not assigned to any variable. On the face of it, it
appears pointless to create a string and just discard it. However, in this case, the fact that this
string literal is the first expression in the function definition, makes it a docstring. What that
means is that the contents of this string are understood to be a human readable help text regarding
the usage of this function.</p>
<p>It also doesn’t have to be a string defined with <code>"""</code>. It may be using single quotes, double quotes
or any other crazy variation we saw above. But, don’t do that. It’s usually a best practice to write
docstrings with <code>"""</code>, and I strongly suggest (and even beg) that you stick to using <code>"""</code> for
docstrings. Please.</p>
<p>It’s also not <em>entirely</em> true that this string is not assigned to a variable. Docstrings are saved
to the <code>.__doc__</code> attribute of the function (or whatever object) they are documenting. In our
example above, we can get the docstring from <code>triple.__doc__</code>. But it’s usually more practical to
call the <code>help</code> function to read the docstring.</p>
<p>For classes, the docstring should be the first expression inside the class body, positioned
similarly to that of a function. For modules, the docstring should be the first expression in the
module (even before any imports).</p>
<p>A minor note regarding docstrings regarding the formatting of their content is to use [ReST][rst]
(also called <strong>reStructuredText</strong>). It is not strictly required, but I suggest you do so, in the
event that you choose to generate HTML help pages from your docstrings, you’ll be glad you wrote
them in ReST.</p>
<h2 id="conclusion">Conclusion<a class="headerlink" href="#conclusion" title="Permanent link">¶</a></h2>
<p>It’s hard to imagine a Python program that doesn’t have something to do with strings. As such, we
have been provided with a lot of utilities within the standard distribution for working with
strings. Even in an article of this size, I couldn’t be exhaustive. As always, Python’s official
documentation is unreal good. It pays to occasionally open a random page and skim over it.</p>Automating the Vim workplace2020-01-12T00:00:00+05:302020-01-12T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2020-01-12:/posts/automating-the-vim-workplace/<p>I majorly use two tools for my coding workflow and one of them is GVim (on Windows). It’s my primary
choice for editing text for ten years now and in that time, I’ve picked up several habits and tricks
that made me very productive.</p>
<p>This article is part …</p><p>I majorly use two tools for my coding workflow and one of them is GVim (on Windows). It’s my primary
choice for editing text for ten years now and in that time, I’ve picked up several habits and tricks
that made me very productive.</p>
<p>This article is part of a series:</p>
<ol>
<li>Chapter Ⅰ (this article).</li>
<li><a href="../automating-the-vim-workplace-2/">Chapter Ⅱ</a>.</li>
<li><a href="../automating-the-vim-workplace-3/">Chapter Ⅲ</a>.</li>
</ol>
<div class="toc"><span class="toctitle">Table of Contents</span><ul>
<li><a href="#motivation">Motivation</a></li>
<li><a href="#switching-to-normal-mode">Switching to Normal Mode</a></li>
<li><a href="#start-gvim-maximized-in-windows">Start GVim Maximized, in Windows</a></li>
<li><a href="#save-all-buffers">Save All Buffers</a></li>
<li><a href="#copy-to-system-clipboard">Copy to System Clipboard</a></li>
<li><a href="#ensure-directory-exists-before-saving">Ensure Directory Exists, Before Saving</a></li>
<li><a href="#switching-to-alternate-buffer">Switching to Alternate Buffer</a></li>
<li><a href="#run-git-commands-in-terminal">Run Git Commands in :terminal</a></li>
<li><a href="#non-undo-able-insert-mode-commands">Non-undo-able Insert Mode Commands</a></li>
<li><a href="#quickly-open-ftplugin">Quickly Open ftplugin</a></li>
<li><a href="#sorting-over-motion">Sorting over Motion</a></li>
<li><a href="#reversing-over-motion">Reversing over Motion</a></li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
</div>
<h2 id="motivation">Motivation<a class="headerlink" href="#motivation" title="Permanent link">¶</a></h2>
<p>Most of my text editing involves working with Python, Markdown, and JavaScript source files. When
I’m spending as much time as I am with Vim, it ceases to be just a tool in my mind. It becomes a
state of mind where I’m able to translate my thoughts into actions much faster than it/I can do with
something else (besides being an excuse to be fancy with words). It becomes my workplace.</p>
<p>Just like organizing one’s desk or toolbox for maximum efficiency, we can mold Vim to help us
achieve something similar with it. I try to notice things that I do often, that take more than 3-4
seconds of thought and then a few more seconds of hitting hotkeys or commands. These are the ones I
try to create a command or a mapping. In my world, this is borderline automation.</p>
<p>What I’m sharing here is stuff I created/scavenged through years of identifying patterns <em>very
specific</em> to my work style. My goal is not to share nice tidbits of Vim configuration. It is to
encourage you to identify your work style and work towards optimising it, before you go find a
plugin and <em>learn</em> the plugin’s work style. As such, I don’t expect you to resonate with the tips I
shared here. Your own style of working deserves the first chance, let Vim learn it.</p>
<p class="note">Please note that all that I share below is what I’m using with Vim (more specifically, GVim on
Windows). I don’t use Neovim (yet) and I can’t speak for any of the below for Neovim.</p>
<h2 id="switching-to-normal-mode">Switching to Normal Mode<a class="headerlink" href="#switching-to-normal-mode" title="Permanent link">¶</a></h2>
<p>Probably the action that is done most often is switching to normal & insert modes. Switching to
insert mode is usually with several different keys (<kbd>i</kbd>, <kbd>a</kbd>, <kbd>o</kbd> etc.),
but for switching to normal mode, we usually use one single key. My preference for this is
<kbd><C-l></kbd>, since <kbd>l</kbd> is on the home row and the help pages already sort-of
indicate that hitting it would go to the normal mode (if <a href="http://vimdoc.sourceforge.net/htmldoc/options.html#'insertmode'" rel="noopener noreferrer" target="_blank"><code>'insertmode'</code></a> is set, but
well, it’s unused otherwise, See <a href="http://vimdoc.sourceforge.net/htmldoc/insert.html#i_CTRL-L" rel="noopener noreferrer" target="_blank"><code>:h i_CTRL-l</code></a>).</p>
<div class="hl"><pre class=content><code><span><span class="nb">inoremap</span> <span class="p"><</span>C<span class="p">-</span><span class="k">l</span><span class="p">></span> <span class="p"><</span>Esc<span class="p">></span>
</span></code></pre></div>
<p class="note">This is a topic that often brings up an uncontrollable urge to be vocal about one’s own choice of
keys to go to normal mode. I’ve used several of them over the years, <kbd>jj</kbd>,
<kbd><CapsLock></kbd> as <kbd><ESC></kbd>, <kbd><C-[></kbd>,
<kbd><C-c></kbd>, mapping <kbd><C-k></kbd>, <em>xcape</em> in the background, etc. All of them
felt haphazard, and <kbd><C-l></kbd> worked the best for me. As I said, this article is about
what worked best to <em>my</em> workflow. Go discover your own.</p>
<p>Of course, now we need a quick way to open our <code>vimrc</code> file so we can add this mapping and then get
back to whatever we are doing. Well,</p>
<div class="hl"><pre class=content><code><span><span class="nb">nnoremap</span> cv :<span class="k">e</span> $MYVIMRC<span class="p"><</span>CR<span class="p">></span>
</span></code></pre></div>
<p>The <code>cv</code> is a mnemonic for <em>change vimrc</em>.</p>
<blockquote>
<p>This mapping was originally defined as <code>:e $USERPROFILE/vimfiles/vimrc<CR></code>. Thanks to the helpful
community at <a href="https://www.reddit.com/r/vim/comments/enlz8x/automating_the_vim_workplace/fe396x0" rel="noopener noreferrer" target="_blank">r/vim</a> and a comment here, I realized <code>$MYVIMRC</code> is a better fit here. Thank you
folks!</p>
</blockquote>
<p>This is what I’m talking about when I say identify things that you often do. Even if you don’t sit
down to automate it right away, put it on a sticky near your desk. Spend a few minutes thinking
about it. A few seconds in a time of intense focus is far more dear than a few minutes in slacking.</p>
<p>Note that this mapping is not without it’s quirks. It interferes with the line completion mapping,
<kbd><C-x><C-l></kbd>. It’ll still work, but right after triggering
<kbd><C-x><C-l></kbd>, if you hit <kbd><C-l></kbd>, you won’t go to normal mode.
You’ll merely go to the next selected item in the completion popup. Other than this,
<kbd><C-l></kbd> for going to normal mode works quite well.</p>
<p>Now that the mapping is setup, I can hit <kbd><C-l></kbd> in insert mode to go to normal mode.
Then I noticed something else in the way I <em>tried</em> to use it, subconsciously. I started hitting
<kbd><C-l></kbd> in visual mode, operator pending mode etc. to go into normal mode. I realized
I was using <kbd><C-l></kbd> essentially as a replacement of <kbd><ESC></kbd>. But of
course it failed because I only created a mapping for insert mode.</p>
<p>After a few iterations and shower thoughts, this is what I currently use:</p>
<div class="hl"><pre class=content><code><span><span class="c">" Easier way to go to normal mode. Also, alternative to <ESC>.</span>
</span><span><span class="nb">noremap</span><span class="p">!</span> <span class="p"><</span><span class="k">silent</span><span class="p">></span> <span class="p"><</span>C<span class="p">-</span><span class="k">l</span><span class="p">></span> <span class="p"><</span>ESC<span class="p">></span>
</span><span><span class="nb">vnoremap</span> <span class="p"><</span><span class="k">silent</span><span class="p">></span> <span class="p"><</span>C<span class="p">-</span><span class="k">l</span><span class="p">></span> <span class="p"><</span>ESC<span class="p">></span>
</span><span>onoremap <span class="p"><</span><span class="k">silent</span><span class="p">></span> <span class="p"><</span>C<span class="p">-</span><span class="k">l</span><span class="p">></span> <span class="p"><</span>ESC<span class="p">></span>
</span></code></pre></div>
<p>I also wanted this from the command line, but I’m still trying to get it to work. I currently have
the following but it’s not very robust. Every time I hit <kbd><C-l></kbd> in the normal mode,
the cursor moves ahead by two characters. Still working on getting it to work well.</p>
<div class="hl"><pre class=content><code><span><span class="c">" <ESC> doesn't work and even this moves the cursor by two characters.</span>
</span><span>cnoremap <span class="p"><</span><span class="k">silent</span><span class="p">></span> <span class="p"><</span>C<span class="p">-</span><span class="k">l</span><span class="p">></span> <span class="p"><</span>C<span class="p">-</span><span class="k">c</span><span class="p">></span>
</span></code></pre></div>
<p>It’s a never ending process of learning and experimenting.</p>
<h2 id="start-gvim-maximized-in-windows">Start GVim Maximized, in Windows<a class="headerlink" href="#start-gvim-maximized-in-windows" title="Permanent link">¶</a></h2>
<p>As another example, I wanted GVim to start maximized when I open it. On way to do this was to check
the Maximized checkbox in the GVim shortcut’s properties. But that won’t work when I start GVim from
a command line. The solution that worked even better:</p>
<div class="hl"><pre class=content><code><span><span class="c">" Maximize gVim window.</span>
</span><span><span class="k">let</span> s:iswin <span class="p">=</span> has<span class="p">(</span><span class="s1">'win32'</span><span class="p">)</span> <span class="p">||</span> has<span class="p">(</span><span class="s1">'win64'</span><span class="p">)</span>
</span><span><span class="k">if</span> exists<span class="p">(</span><span class="s1">':simalt'</span><span class="p">)</span> <span class="p">></span> <span class="m">0</span> && s:iswin
</span><span> autocmd <span class="nb">GUIEnter</span> * <span class="k">simalt</span> <span class="p">~</span><span class="k">x</span>
</span><span><span class="k">endif</span>
</span></code></pre></div>
<h2 id="save-all-buffers">Save All Buffers<a class="headerlink" href="#save-all-buffers" title="Permanent link">¶</a></h2>
<p>I often use the <code>:wa</code> command to save all my open buffers. But it has the nasty habit of throwing an
error when it’s not able to save all buffers. This is annoying because I often have scratch buffers
in vertical splits where I dump random pieces of copied text and thoughts. So, I prepared the
following hotkey that will execute the <code>:wa</code> command and, if that error comes up, shows a message
instead.</p>
<div class="hl"><pre class=content><code><span><span class="nb">nnoremap</span> <span class="p"><</span><span class="k">silent</span><span class="p">></span> <span class="p"><</span>C<span class="p">-</span><span class="k">m</span><span class="p">></span> :<span class="k">try</span>\<span class="p">|</span><span class="k">wa</span>\<span class="p">|</span><span class="k">catch</span> <span class="sr">/\<E141\>/</span>\<span class="p">|</span><span class="k">echomsg</span> <span class="s1">'Not all files saved!'</span>\<span class="p">|</span><span class="k">endtry</span><span class="p"><</span>CR<span class="p">></span>
</span></code></pre></div>
<p>This doesn’t look like an ideal solution, but it hasn’t failed me yet. The idea is not to create an
perfect solution, but just one that works well with you.</p>
<p class="note">If you’re using the above mapping, note that mapping to <kbd><C-m></kbd> is almost the same as
mapping to the <kbd><Return></kbd> key on your keyboard. So hitting the return key in normal
mode will also trigger the above mapping. Just something to keep in mind.</p>
<h2 id="copy-to-system-clipboard">Copy to System Clipboard<a class="headerlink" href="#copy-to-system-clipboard" title="Permanent link">¶</a></h2>
<p>I often have to copy stuff to system clipboard to paste into chat channels and emails. The standard
way to do this would be something like <kbd>“+yap</kbd> in normal mode, or <kbd>“+y</kbd> in visual
mode. This is annoying, not because it’s three keys, but more because they are hard to type in order
and they are (almost) all with the same hand. So I solved it with the following keys:</p>
<div class="hl"><pre class=content><code><span>xnoremap <span class="p"><</span>C<span class="p">-</span><span class="k">c</span><span class="p">></span> <span class="c">"+y</span>
</span><span><span class="nb">nnoremap</span> <span class="p"><</span><span class="k">silent</span><span class="p">></span> <span class="k">cp</span> <span class="c">"+y</span>
</span><span><span class="nb">nnoremap</span> <span class="p"><</span><span class="k">silent</span><span class="p">></span> cpp <span class="c">"+yy</span>
</span></code></pre></div>
<p>With this, <kbd><C-c></kbd> in visual mode copies selection to clipboard and <kbd>cp</kbd> can
be used with text objects. Much easier to hit.</p>
<h2 id="ensure-directory-exists-before-saving">Ensure Directory Exists, Before Saving<a class="headerlink" href="#ensure-directory-exists-before-saving" title="Permanent link">¶</a></h2>
<p>I often edit new files like <code>:e css/styles.css</code>, without realizing that I have to create the <code>css</code>
folder before saving this. But that’s not productive, my tool should do that automatically.</p>
<div class="hl"><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span><span>9
</span><span>10
</span><span>11
</span></pre><pre class=content><code><span><span class="c">" Create file's directory before saving, if it doesn't exist.</span>
</span><span><span class="c">" Original: https://stackoverflow.com/a/4294176/151048</span>
</span><span>augroup BWCCreateDir
</span><span> autocmd<span class="p">!</span>
</span><span> autocmd <span class="nb">BufWritePre</span> * :<span class="k">call</span> s:MkNonExDir<span class="p">(</span>expand<span class="p">(</span><span class="s1">'<afile>'</span><span class="p">),</span> <span class="p">+</span>expand<span class="p">(</span><span class="s1">'<abuf>'</span><span class="p">))</span>
</span><span>augroup END
</span><span><span class="k">fun</span><span class="p">!</span> s:MkNonExDir<span class="p">(</span><span class="k">file</span><span class="p">,</span> <span class="k">buf</span><span class="p">)</span>
</span><span> <span class="k">if</span> empty<span class="p">(</span>getbufvar<span class="p">(</span><span class="k">a</span>:<span class="k">buf</span><span class="p">,</span> <span class="s1">'&buftype'</span><span class="p">))</span> && <span class="k">a</span>:<span class="k">file</span> <span class="p">!~</span># <span class="s1">'\v^\w+\:\/'</span>
</span><span> <span class="k">call</span> mkdir<span class="p">(</span>fnamemodify<span class="p">(</span><span class="k">a</span>:<span class="k">file</span><span class="p">,</span> <span class="s1">':h'</span><span class="p">),</span> <span class="s1">'p'</span><span class="p">)</span>
</span><span> <span class="k">endif</span>
</span><span><span class="k">endfun</span>
</span></code></pre></div>
<p>Let’s see what’s going on here. Firstly, we define an <a href="http://vimdoc.sourceforge.net/htmldoc/autocmd.html#autocmd.txt" rel="noopener noreferrer" target="_blank"><code>autocmd</code></a> for the
<a href="http://vimdoc.sourceforge.net/htmldoc/autocmd.html#BufWritePre" rel="noopener noreferrer" target="_blank"><code>BufWritePre</code></a> event, which is fired just before a file is saved, to call the function
<code>s:MkNonExDir</code>. In this function, we check for the buffer being a normal buffer (see <a href="http://vimdoc.sourceforge.net/htmldoc/options.html#'buftype'" rel="noopener noreferrer" target="_blank"><code>:h
buftype</code></a>) and if it is, create it’s parent directory.</p>
<p>Simple, non-intrusive, and effective.</p>
<h2 id="switching-to-alternate-buffer">Switching to Alternate Buffer<a class="headerlink" href="#switching-to-alternate-buffer" title="Permanent link">¶</a></h2>
<p>The default key-binding for <a href="http://vimdoc.sourceforge.net/htmldoc/editing.html#CTRL-6" rel="noopener noreferrer" target="_blank"><kbd><C-^></kbd></a> (or <a href="http://vimdoc.sourceforge.net/htmldoc/editing.html#CTRL-6" rel="noopener noreferrer" target="_blank"><kbd><C-6></kbd></a>)
lets us quickly switch back-and-forth between two buffers. This is extremely handy and is likely one
of my most used functionality for switching buffers within Vim.</p>
<p>There’s some annoying quirks to this mapping though. For example, if there’s files in your buffer
list, but no <em>alternate</em> buffer, we’ll get an error saying “No alternate buffer”. Which is not
helpful. So a few years ago I saw a mapping to go to the next buffer if there’s no alternate buffer.
It worked to an extent, but there’s more.</p>
<p>When I delete a buffer with <code>:bd</code>, I get taken to a different buffer. Now if I hit
<kbd><C-6></kbd> again, the buffer I just deleted is loaded again and I’m back in it. This may
be what one usually wants, but for me, I want to be taken to the next buffer that’s still <em>loaded</em>,
not deleted ones.</p>
<div class="hl"><pre class=content><code><span><span class="c">" My remapping of <C-^>. If there is no alternate file, and there's no count given, then switch</span>
</span><span><span class="c">" to next file. We use `bufloaded` to check for alternate buffer presence. This will ignore</span>
</span><span><span class="c">" deleted buffers, as intended. To get default behaviour, use `bufexists` in it's place.</span>
</span><span><span class="nb">nnoremap</span> <span class="p"><</span><span class="k">silent</span><span class="p">></span> <span class="p"><</span>C<span class="p">-</span><span class="k">n</span><span class="p">></span> :<span class="p"><</span>C<span class="p">-</span><span class="k">u</span><span class="p">></span>exe <span class="k">v</span>:count ? <span class="k">v</span>:count . <span class="s1">'b'</span> : <span class="s1">'b'</span> . <span class="p">(</span>bufloaded<span class="p">(</span><span class="m">0</span><span class="p">)</span> ? <span class="s1">'#'</span> : <span class="s1">'n'</span><span class="p">)<</span>CR<span class="p">></span>
</span></code></pre></div>
<p>This is the mapping I use for switching between alternate buffers. I use <kbd><C-n></kbd> as
it’s easier to hit and there’s a simpler key for it’s default functionality anyway (<kbd>j</kbd>).</p>
<p>Additionally if you’re using the <a href="https://github.com/tpope/vim-eunuch/" rel="noopener noreferrer" target="_blank">eunuch plugin</a>, this
mapping will not navigate to a buffer that’s been <code>Delete</code>-ed.</p>
<h2 id="run-git-commands-in-terminal">Run Git Commands in <code>:terminal</code><a class="headerlink" href="#run-git-commands-in-terminal" title="Permanent link">¶</a></h2>
<p>Running git commands is another thing I often do, while working in Vim. Most of the time, it just a
<code>status</code> or <code>diff</code>, so I needed something quicker than switching to a terminal and running the
command.</p>
<p>I initially used <a href="https://github.com/tpope/vim-fugitive" rel="noopener noreferrer" target="_blank">fugitive</a>, but it felt slow on Windows (very likely because of the required
anti-virus). It works fine when I’m on Linux, but on Windows, it’s not productive for me. Besides,
it does a lot of things I don’t usually need. The following is the mapping that serves <em>most</em> of
what I need from within Vim.</p>
<div class="hl"><pre class=content><code><span><span class="nb">nnoremap</span> <span class="p"><</span>Leader<span class="p">></span><span class="k">g</span> :ter git <span class="p">--</span>no<span class="p">-</span>pager<span class="p"><</span>Space<span class="p">></span>
</span></code></pre></div>
<p>So, what does this do? Well, I hit <kbd>,g</kbd> (because <kbd>,</kbd> is my
<a href="http://vimdoc.sourceforge.net/htmldoc/map.html#mapleader" rel="noopener noreferrer" target="_blank"><code>mapleader</code></a>) and the cursor is placed in the command line with the following
pre-filled:</p>
<div class="hl"><pre class=content><code><span><span class="p">:</span>ter git <span class="p">--</span>no<span class="p">-</span>pager
</span></code></pre></div>
<p>Then I just hit <kbd>st<Enter></kbd>, which will open a new <a href="http://vimdoc.sourceforge.net/htmldoc/term.html" rel="noopener noreferrer" target="_blank">terminal</a> within Vim which runs
<code>git st</code> command asynchronously (which is an alias to <code>git status</code>).</p>
<p>After seeing the output I noticed that I immediately issued another <kbd>,gdiff<Enter></kbd>,
which opens up another terminal split to run the <code>git diff</code> command. Such multiple splits quickly
got annoying again. Yeah, I’m easily annoyed. I need this mapping to <em>not</em> open a new split if I’m
already in a <code>git</code> output terminal window. Here’s what I’m using currently:</p>
<div class="hl"><pre class=content><code><span><span class="nb">nnoremap</span> <span class="p"><</span>Leader<span class="p">></span><span class="k">g</span> :ter <span class="p"><</span>C<span class="p">-</span><span class="k">r</span><span class="p">>=</span>&<span class="nb">buftype</span> <span class="p">==</span> <span class="s1">'terminal'</span>
</span><span> \ && job_info<span class="p">(</span>term_getjob<span class="p">(</span><span class="s1">'%'</span><span class="p">))</span>.cmd[<span class="m">0</span>] <span class="p">==</span>? <span class="s1">'git'</span> ? <span class="s1">'++curwin '</span> : <span class="s1">''</span>
</span><span> \ <span class="p"><</span>CR<span class="p">></span>git <span class="p">--</span>no<span class="p">-</span>pager<span class="p"><</span>Space<span class="p">></span>
</span></code></pre></div>
<p>We check if the current buffer is a terminal and if the command is <code>git</code>, if yes, we tell <code>:ter</code> to
open the terminal in the current window instead of opening up a new split.</p>
<h2 id="non-undo-able-insert-mode-commands">Non-undo-able Insert Mode Commands<a class="headerlink" href="#non-undo-able-insert-mode-commands" title="Permanent link">¶</a></h2>
<p>In insert mode, <a href="http://vimdoc.sourceforge.net/htmldoc/insert.html#i_CTRL-U" rel="noopener noreferrer" target="_blank"><kbd><C-u></kbd></a> deletes everything from start of current line to
cursor position (this is not <em>exactly</em> true, read <code>:h i_CTRL-U</code> for the exact behaviour, I won’t
repeat it here). This is quite convenient and I use it a lot more than I like to admit. Often, when
I start a statement in a new line, I have second thoughts middle of the line and I quickly hit
<kbd><C-u></kbd> and start typing in the idea from my second thought. But then of course, I
realize that what I was doing originally was the right way. Now if I try to undo what’s done by
<kbd><C-u></kbd>, I can’t. Since it’s all treated as one insert operation, it’s all one undo
step.</p>
<p>This is why I got this:</p>
<div class="hl"><pre class=content><code><span><span class="c">" CTRL-U in insert mode deletes a lot. Put an undo-point before it.</span>
</span><span><span class="nb">inoremap</span> <span class="p"><</span>C<span class="p">-</span><span class="k">u</span><span class="p">></span> <span class="p"><</span>C<span class="p">-</span><span class="k">g</span><span class="p">></span><span class="k">u</span><span class="p"><</span>C<span class="p">-</span><span class="k">u</span><span class="p">></span>
</span></code></pre></div>
<p>I don’t recall the source of this but I found this after a bit of searching online for a solution
and it works! Whoever came up with this, thank you!</p>
<blockquote>
<p>Thanks to this <a href="https://www.reddit.com/r/vim/comments/enlz8x/automating_the_vim_workplace/fe3973i" rel="noopener noreferrer" target="_blank">kind person’s
hint</a>, I was
able to find the source of this. It’s actually in the <code>defaults.vim</code> file that is shipped with
Vim.</p>
</blockquote>
<h2 id="quickly-open-ftplugin">Quickly Open <code>ftplugin</code><a class="headerlink" href="#quickly-open-ftplugin" title="Permanent link">¶</a></h2>
<p>This is one that I don’t use <em>as often</em> as some of the above, but when I do need it, it’s extremely
handy. I use the <code>$VIMFILES/after/ftplugin</code> directory to put in my custom settings for specific file
types. These files usually don’t just contain changes in settings like indentation, but also
<code>commentstring</code> and often some command(s) that makes editing that specific <code>filetype</code> a bit easier.</p>
<p>These commands let me open the plugin file in that directory for the <code>filetype</code> I’m currently
working with.</p>
<div class="hl"><pre class=content><code><span><span class="c">" Edit my filetype/syntax plugin files for current filetype.</span>
</span><span>command <span class="p">-</span>nargs<span class="p">=</span>? <span class="p">-</span><span class="nb">complete</span><span class="p">=</span><span class="k">filetype</span> EditFileTypePlugin
</span><span> \ exe <span class="s1">'keepj vsplit $VIMFILES/after/ftplugin/'</span> . <span class="p">(</span>empty<span class="p">(<</span><span class="k">q</span><span class="p">-</span>args<span class="p">>)</span> ? &<span class="k">filetype</span> : <span class="p"><</span><span class="k">q</span><span class="p">-</span>args<span class="p">>)</span> . <span class="s1">'.vim'</span>
</span><span>command <span class="p">-</span>nargs<span class="p">=</span>? <span class="p">-</span><span class="nb">complete</span><span class="p">=</span><span class="k">filetype</span> Eft EditFileTypePlugin <span class="p"><</span>args<span class="p">></span>
</span></code></pre></div>
<p>The same thing for syntax plugin:</p>
<div class="hl"><pre class=content><code><span>command <span class="p">-</span>nargs<span class="p">=</span>? <span class="p">-</span><span class="nb">complete</span><span class="p">=</span><span class="k">filetype</span> EditSyntaxPlugin
</span><span> \ exe <span class="s1">'keepj vsplit $VIMFILES/after/syntax/'</span> . <span class="p">(</span>empty<span class="p">(<</span><span class="k">q</span><span class="p">-</span>args<span class="p">>)</span> ? &<span class="k">filetype</span> : <span class="p"><</span><span class="k">q</span><span class="p">-</span>args<span class="p">>)</span> . <span class="s1">'.vim'</span>
</span><span>command <span class="p">-</span>nargs<span class="p">=</span>? <span class="p">-</span><span class="nb">complete</span><span class="p">=</span><span class="k">filetype</span> Esy EditSyntaxPlugin <span class="p"><</span>args<span class="p">></span>
</span></code></pre></div>
<p class="note">Note that the <code>:Eft</code> and <code>:Esy</code> commands act like short aliases for these commands.</p>
<p>These commands are obviously heavily inspired by the <code>:EditUltiSnipsFile</code> command from the
<a href="https://github.com/sirver/UltiSnips" rel="noopener noreferrer" target="_blank">UltiSnips</a> plugin (which is great at automation by the way).</p>
<h2 id="sorting-over-motion">Sorting over Motion<a class="headerlink" href="#sorting-over-motion" title="Permanent link">¶</a></h2>
<p>Vim comes with the <a href="http://vimdoc.sourceforge.net/htmldoc/change.html#:sort" rel="noopener noreferrer" target="_blank"><code>:sort</code></a> command that sorts the range of lines provided. So, for
example, to sort the whole file, we’d do <code>:%sort</code>. To sort the first ten lines, something like
<code>:1,10sort</code> should do. The range of lines given will be replaced with the sorted lines.</p>
<p>This is convenient, but not very handy. But I’d always wanted a way to sort over a motion, like
<em>sort this paragraph</em> or <em>sort inside braces</em> etc. So, after some searching online and digging the
Vim documentation, I have the following in my <code>vimrc</code>:</p>
<div class="hl"><pre class=content><code><span><span class="c">" Sort lines, selected or over motion.</span>
</span><span>xnoremap <span class="p"><</span><span class="k">silent</span><span class="p">></span> gs :<span class="k">sort</span> <span class="k">i</span><span class="p"><</span>CR<span class="p">></span>
</span><span><span class="nb">nnoremap</span> <span class="p"><</span><span class="k">silent</span><span class="p">></span> gs :<span class="k">set</span> <span class="nb">opfunc</span><span class="p">=</span>SortLines<span class="p"><</span>CR<span class="p">></span><span class="k">g</span>@
</span><span><span class="k">fun</span><span class="p">!</span> SortLines<span class="p">(</span>type<span class="p">)</span> abort
</span><span> <span class="s1">'[,'</span>]<span class="k">sort</span> <span class="k">i</span>
</span><span><span class="k">endfun</span>
</span></code></pre></div>
<p>With this, hitting <kbd>gsip</kbd> would sort the lines <em>inside</em> the current paragraph. Similarly,
<kbd>gsiB</kbd> would sort lines inside the braces closest to the cursor (try this one in CSS!). If
you have the <a href="https://github.com/michaeljsmith/vim-indent-object/" rel="noopener noreferrer" target="_blank">vim-indent-object</a> plugin, you
could also do <kbd>gsii</kbd> to sort lines in current indent block.</p>
<p>Additionally, we also have an <code>xnoremap</code> mapping definition which lets us use <kbd>gs</kbd> in
visual mode to sort the highlighted lines. I don’t use this as often as the operator version above,
but it’s nice to have nonetheless.</p>
<h2 id="reversing-over-motion">Reversing over Motion<a class="headerlink" href="#reversing-over-motion" title="Permanent link">¶</a></h2>
<p>This is very similar to the above. Instead of sorting, I’m reversing the lines. Unfortunately, we
don’t have a <code>:reverse</code> command like <code>:sort</code>, so this one is more DIY.</p>
<div class="hl"><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span><span>9
</span><span>10
</span><span>11
</span><span>12
</span><span>13
</span><span>14
</span><span>15
</span></pre><pre class=content><code><span><span class="c">" Reverse lines, selected or over motion.</span>
</span><span><span class="nb">nnoremap</span> <span class="p"><</span><span class="k">silent</span><span class="p">></span> <span class="k">gr</span> :<span class="k">set</span> <span class="nb">opfunc</span><span class="p">=</span>ReverseLines<span class="p"><</span>CR<span class="p">></span><span class="k">g</span>@
</span><span><span class="nb">vnoremap</span> <span class="p"><</span><span class="k">silent</span><span class="p">></span> <span class="k">gr</span> :<span class="p"><</span>C<span class="p">-</span><span class="k">u</span><span class="p">></span><span class="k">call</span> ReverseLines<span class="p">(</span><span class="s1">'vis'</span><span class="p">)<</span>CR<span class="p">></span>
</span><span><span class="k">fun</span><span class="p">!</span> ReverseLines<span class="p">(</span>type<span class="p">)</span> abort
</span><span> <span class="k">let</span> <span class="k">marks</span> <span class="p">=</span> <span class="k">a</span>:type <span class="p">==</span>? <span class="s1">'vis'</span> ? <span class="s1">'<>'</span> : <span class="s1">'[]'</span>
</span><span> <span class="k">let</span> [_<span class="p">,</span> l1<span class="p">,</span> c1<span class="p">,</span> _] <span class="p">=</span> getpos<span class="p">(</span><span class="s2">"'"</span> . <span class="k">marks</span>[<span class="m">0</span>]<span class="p">)</span>
</span><span> <span class="k">let</span> [_<span class="p">,</span> l2<span class="p">,</span> c2<span class="p">,</span> _] <span class="p">=</span> getpos<span class="p">(</span><span class="s2">"'"</span> . <span class="k">marks</span>[<span class="m">1</span>]<span class="p">)</span>
</span><span> <span class="k">if</span> l1 <span class="p">==</span> l2
</span><span> <span class="k">return</span>
</span><span> <span class="k">endif</span>
</span><span> <span class="k">for</span> line <span class="k">in</span> getline<span class="p">(</span>l1<span class="p">,</span> l2<span class="p">)</span>
</span><span> <span class="k">call</span> setline<span class="p">(</span>l2<span class="p">,</span> line<span class="p">)</span>
</span><span> <span class="k">let</span> l2 <span class="p">-=</span> <span class="m">1</span>
</span><span> <span class="k">endfor</span>
</span><span><span class="k">endfun</span>
</span></code></pre></div>
<p>I mapped reversing to <kbd>gr</kbd>, which works similar to the <kbd>gs</kbd> from previous section,
but instead of sorting, the lines will be reversed. Everything in the above snippet can be looked up
with <code>:h</code> command within Vim. I’ll leave the understanding-it’s-working part as an exercise to the
reader, if inclined.</p>
<h2 id="conclusion">Conclusion<a class="headerlink" href="#conclusion" title="Permanent link">¶</a></h2>
<p>This articles looks an awful lot like a list of Vim tips, but I implore you to see further. I picked
these specific things from my Vim setup (which is a lot bigger than this) to illustrate the idea of
identifying and then automating. Of course, these snippets I shared above, in my opinion are too
small for a full blown plugin, yet not too insignificant to not be shared. I intend to follow up
with more ideas from my configuration, so stay tuned.</p>
<p>I also encourage you to go over the Vim help pages often. They contain some awesome tips and ideas
that serve as great starter points to improve your workflow. So, just, you know, while that really
long build is running, grab a coffee and open the Vim docs!</p>
<p>Identify, optimize, repeat.</p>
<p class="note">Read the <a href="../automating-the-vim-workplace-2/">next article</a> in this series.</p>Python's `map` builtin function2020-01-04T00:00:00+05:302020-01-04T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2020-01-04:/posts/python-map-function/<p>In this article, we’ll take a look at Python’s stream processing utility function, <code>map</code>. This
function can enable us to write powerful list/stream-processing routines that can be easy to read
and understand.</p>
<p>Let’s go over the basics first so we have context when talking about them …</p><p>In this article, we’ll take a look at Python’s stream processing utility function, <code>map</code>. This
function can enable us to write powerful list/stream-processing routines that can be easy to read
and understand.</p>
<p>Let’s go over the basics first so we have context when talking about them.</p>
<h2 id="syntax">Syntax<a class="headerlink" href="#syntax" title="Permanent link">¶</a></h2>
<p>Calling <code>map</code>:</p>
<div class="hl"><pre class=content><code><span><span class="c1"># From official docs</span>
</span><span><span class="nb">map</span><span class="p">(</span><span class="n">function</span><span class="p">,</span> <span class="n">iterable</span><span class="p">,</span> <span class="o">...</span><span class="p">)</span>
</span></code></pre></div>
<p>Where</p>
<dl>
<dt>function</dt>
<dd>Called with each item from <code>iterable</code>.</dd>
<dt>iterable</dt>
<dd>Use to take inputs for calling <code>function</code>.</dd>
<dt><em>returns</em></dt>
<dd>Iterable of return values from calling <code>function</code>.</dd>
</dl>
<h2 id="working-of-map-function">Working of <code>map</code> Function<a class="headerlink" href="#working-of-map-function" title="Permanent link">¶</a></h2>
<p>Here’s a run-book for the <code>map</code> builtin function:</p>
<ol>
<li>Accepts two arguments, a function (or any callable) and a list (or any sequence) of objects.</li>
<li>Call the function once per object in the list, pass the object to the function, and collect the
return value from each call.</li>
<li>Return a generator that will yield the return values as collected by applying above step over and
over until the list from point 1 is exhausted.</li>
</ol>
<p class="note">Note that in Python 2, <code>map</code> used to return a <code>list</code> object. However, in Python 3, it returns a
<code>map</code> object which is a generator that lazily processes each item in the list as they are needed. If
you don’t want to bother with this difference for now, remember to always wrap the result of a <code>map</code>
function with a <code>list</code>. The official <code>2to3</code> tool handles this automatically.</p>
<p>Let’s look at some examples:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="nb">map</span><span class="p">(</span><span class="nb">str</span><span class="p">,</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">))</span>
</span><span><span class="go"><map object at 0x0000000002DCD3C8></span>
</span><span><span class="gp">>>> </span><span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="nb">str</span><span class="p">,</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">)))</span>
</span><span><span class="go">['0', '1', '2', '3', '4']</span>
</span></code></pre></div>
<p>Notice how in the first call to <code>map</code>, we get a <code>map</code> object show up in the result. In this case,
none of the items in <code>range(5)</code> have been processed by <code>str</code>. But when we wrap it in <code>list</code> the next
time, we get the list of all processed items.</p>
<p>We can also pass in lambda functions just fine.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">**</span><span class="mi">2</span><span class="p">,</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">)))</span>
</span><span><span class="go">[0, 1, 4, 9, 16]</span>
</span></code></pre></div>
<p>But don’t do that, that’s silly. We’ll see why later down in this article, but, put simply,
<em>comprehensions are almost always better than a map+lambda combination</em>.</p>
<p>Additionally, <code>map</code> can also take more than one sequence in it’s arguments. In that case, the items
produced by each of the other sequence make up for additional arguments for the given function.</p>
<p>Consider the following call to <code>map</code>:</p>
<div class="hl"><pre class=content><code><span><span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="nb">sum</span><span class="p">,</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="p">[</span><span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">9</span><span class="p">],</span> <span class="p">[</span><span class="mi">100</span><span class="p">,</span> <span class="mi">200</span><span class="p">,</span> <span class="mi">300</span><span class="p">]))</span>
</span></code></pre></div>
<p>This will call the given <code>sum</code> function three times,</p>
<div class="hl"><pre class=content><code><span>sum(1, 7, 100)
</span><span>sum(2, 8, 200)
</span><span>sum(3, 9, 300)
</span></code></pre></div>
<p>It produces a result list of three items, the three return values of the above three calls to <code>sum</code>.</p>
<p>Let’s look at some useful ways we can use the <code>map</code> function in real world code.</p>
<h2 id="using-unbound-methods">Using Unbound Methods<a class="headerlink" href="#using-unbound-methods" title="Permanent link">¶</a></h2>
<p>If the function we want to call is a <em>method call</em> on each object in the given list, we could use
a comprehension or do it with map+lambda like this:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">protocols</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'http'</span><span class="p">,</span> <span class="s1">'tcp'</span><span class="p">,</span> <span class="s1">'xmpp'</span><span class="p">,</span> <span class="s1">'irc'</span><span class="p">]</span>
</span><span><span class="gp">>>> </span><span class="p">[</span><span class="n">protocol</span><span class="o">.</span><span class="n">upper</span><span class="p">()</span> <span class="k">for</span> <span class="n">protocol</span> <span class="ow">in</span> <span class="n">protocols</span><span class="p">]</span>
</span><span><span class="go">['HTTP', 'TCP', 'XMPP', 'IRC']</span>
</span><span><span class="gp">>>> </span><span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">protocol</span><span class="p">:</span> <span class="n">protocol</span><span class="o">.</span><span class="n">upper</span><span class="p">(),</span> <span class="n">protocols</span><span class="p">))</span>
</span><span><span class="go">['HTTP', 'TCP', 'XMPP', 'IRC']</span>
</span></code></pre></div>
<p>But a much simpler way, is to provide the unbound method as the first argument to <code>map</code>.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="nb">str</span><span class="o">.</span><span class="n">upper</span><span class="p">,</span> <span class="n">protocols</span><span class="p">))</span>
</span><span><span class="go">['HTTP', 'TCP', 'XMPP', 'IRC']</span>
</span></code></pre></div>
<p>The reason this works is because calling unbound method with an instance as the first argument, is
almost the same thing as calling the bound method of that instance. In other words,
<code>str.upper('http')</code> is more or less the same as <code>'http'.upper()</code>. This is true for any method on any
class (even <code>classmethod</code>s if you have a list of classes).</p>
<h2 id="more-types-of-sequences">More Types of Sequences<a class="headerlink" href="#more-types-of-sequences" title="Permanent link">¶</a></h2>
<p>Pass in sets, dictionaries (also <code>mydict.get</code> as function), file objects, a string (<code>map(ord,
'abc')</code>) etc.</p>
<p>The second argument to <code>map</code> can be any sequence data type, doesn’t have to be a <code>list</code>. Here’s some
types that are quite useful with <code>map</code>:</p>
<ol>
<li>Sets (function called with each <em>item</em> in set)</li>
<li>Dictionaries (function called with each <em>key</em> in the dictionary)</li>
<li>Files (function called with each <em>line</em> in the open file object)</li>
<li>Strings (function called with each <em>character</em> in the string)</li>
</ol>
<p>We can use dictionaries as the sequence to run a function over each <em>key</em> in the dictionary.
Additionally, we can use the <code>.items</code> or <code>.values</code> to have <code>map</code> run the function over each <code>(key,
value)</code> tuple or just the values respectively.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">numbers</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'one'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="s1">'two'</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span> <span class="s1">'three'</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span> <span class="s1">'four'</span><span class="p">:</span> <span class="mi">4</span><span class="p">}</span>
</span><span><span class="gp">>>> </span><span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="nb">len</span><span class="p">,</span> <span class="n">numbers</span><span class="p">))</span>
</span><span><span class="go">[3, 3, 5, 4]</span>
</span><span><span class="gp">>>> </span><span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="nb">str</span><span class="p">,</span> <span class="n">numbers</span><span class="o">.</span><span class="n">values</span><span class="p">()))</span>
</span><span><span class="go">['1', '2', '3', '4']</span>
</span><span><span class="gp">>>> </span><span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="nb">repr</span><span class="p">,</span> <span class="n">numbers</span><span class="o">.</span><span class="n">items</span><span class="p">()))</span>
</span><span><span class="go">["('one', 1)", "('two', 2)", "('three', 3)", "('four', 4)"]</span>
</span></code></pre></div>
<p>We can use <code>map</code> to transform the lines of a file as we are reading over it. This is actually very
useful to do some small preprocessing on the lines, like removing trailing white space.</p>
<div class="hl"><pre class=content><code><span><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'contents.txt'</span><span class="p">)</span> <span class="k">as</span> <span class="n">open_file</span><span class="p">:</span>
</span><span> <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="nb">map</span><span class="p">(</span><span class="nb">str</span><span class="o">.</span><span class="n">rstrip</span><span class="p">,</span> <span class="n">open_file</span><span class="p">):</span>
</span><span> <span class="k">pass</span>
</span></code></pre></div>
<p>We can map a function like <code>ord</code> (returns the Unicode code point for a single character) over a
string, to get the code points for each character in the string.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="nb">ord</span><span class="p">,</span> <span class="s1">'aluminium'</span><span class="p">))</span>
</span><span><span class="go">[97, 108, 117, 109, 105, 110, 105, 117, 109]</span>
</span></code></pre></div>
<h2 id="dictionaries-as-transformers">Dictionaries as Transformers<a class="headerlink" href="#dictionaries-as-transformers" title="Permanent link">¶</a></h2>
<p>This is another neat trick where we have a dictionary and a list of some <em>keys</em>. We use <code>map</code> to
transform the list of keys to a list of values, referring to the dictionary.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">numbers</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'one'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="s1">'two'</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span> <span class="s1">'three'</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span> <span class="s1">'four'</span><span class="p">:</span> <span class="mi">4</span><span class="p">}</span>
</span><span><span class="gp">>>> </span><span class="n">keys</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'three'</span><span class="p">,</span> <span class="s1">'four'</span><span class="p">,</span> <span class="s1">'two'</span><span class="p">,</span> <span class="s1">'five'</span><span class="p">,</span> <span class="s1">'four'</span><span class="p">,</span> <span class="s1">'two'</span><span class="p">]</span>
</span><span><span class="gp">>>> </span><span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="n">numbers</span><span class="o">.</span><span class="n">get</span><span class="p">,</span> <span class="n">keys</span><span class="p">))</span>
</span><span><span class="go">[3, 4, 2, None, 4, 2]</span>
</span></code></pre></div>
<p>Notice that when faced with a key like <code>'five'</code> that doesn’t exist in the dictionary, we get <code>None</code>,
which is how the <code>dict.get</code> behaves.</p>
<p class="note">Note that in this call to <code>map</code>, we are passing a <em>bound</em> method, <code>numbers.get</code>. This is essentially
the <code>dict.get</code> unbound method, which has been bound to the <code>dict</code> instance we are calling <code>numbers</code>.</p>
<h2 id="infinite-sequences">Infinite Sequences<a class="headerlink" href="#infinite-sequences" title="Permanent link">¶</a></h2>
<p>Since <code>map</code> is lazy from Python 3, it can work with infinite sequences just fine. For our purposes,
let’s create a generator that will generate positive even numbers from zero to infinity:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="k">def</span> <span class="nf">positive_evens</span><span class="p">():</span>
</span><span><span class="gp">... </span> <span class="n">n</span> <span class="o">=</span> <span class="mi">0</span>
</span><span><span class="gp">... </span> <span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
</span><span><span class="gp">... </span> <span class="k">yield</span> <span class="n">n</span>
</span><span><span class="gp">... </span> <span class="n">n</span> <span class="o">+=</span> <span class="mi">2</span>
</span></code></pre></div>
<p>Since this generator never stops by itself, calling <code>list(positive_evens())</code> will never return. So,
we have to put a cap on the amount of data we generate ourselves. Of course, <code>map</code> doesn’t care.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">positive_evens</span><span class="p">():</span>
</span><span><span class="gp">... </span> <span class="k">if</span> <span class="n">e</span> <span class="o">></span> <span class="mi">3</span><span class="p">:</span>
</span><span><span class="gp">... </span> <span class="k">break</span>
</span><span><span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="n">e</span><span class="p">)</span>
</span><span><span class="gp">...</span>
</span><span><span class="go">0</span>
</span><span><span class="go">2</span>
</span><span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">math</span>
</span><span><span class="gp">>>> </span><span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="nb">map</span><span class="p">(</span><span class="n">math</span><span class="o">.</span><span class="n">sqrt</span><span class="p">,</span> <span class="n">positive_evens</span><span class="p">()):</span>
</span><span><span class="gp">... </span> <span class="k">if</span> <span class="n">e</span> <span class="o">></span> <span class="mi">3</span><span class="p">:</span>
</span><span><span class="gp">... </span> <span class="k">break</span>
</span><span><span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="n">e</span><span class="p">)</span>
</span><span><span class="gp">...</span>
</span><span><span class="go">0.0</span>
</span><span><span class="go">1.4142135623730951</span>
</span><span><span class="go">2.0</span>
</span><span><span class="go">2.449489742783178</span>
</span><span><span class="go">2.8284271247461903</span>
</span></code></pre></div>
<p>The <code>map</code> function doesn’t care that the generator we passed in is never ending. It only processes
as many items as the <code>for</code> loop requests.</p>
<p class="note">Be careful with infinite generators though, it’s very easy to end up in an infinite loop situation.</p>
<h2 id="side-effect-operations">Side Effect Operations<a class="headerlink" href="#side-effect-operations" title="Permanent link">¶</a></h2>
<p>The <code>map</code> function is best used as a <em>transformation</em> done to each item in a sequence. In this
sense, the function that’s passed in is usually a pure function. Passing in functions that are
purely intended for side effects (like <code>print</code>, <code>log.debug</code> etc.) is in bad taste (opinion alert!).</p>
<p>This is mostly because of two reasons. First, we’ll have to pass the return value of <code>map</code> to <code>list</code>
to get our <code>print</code> calls to run. Second, we’ll then have a list of <code>None</code>s that’s just a sad waste.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="nb">print</span><span class="p">,</span> <span class="n">protocols</span><span class="p">))</span>
</span><span><span class="go">http</span>
</span><span><span class="go">tcp</span>
</span><span><span class="go">xmpp</span>
</span><span><span class="go">irc</span>
</span><span><span class="go">[None, None, None, None]</span>
</span></code></pre></div>
<p>The better way to do this is to just use a <code>for</code> loop and make the intention clear. The intention is
to do something <em>with</em> each item in the sequence. Not to do something <em>to</em> each item in the sequence
and collect their return value.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="k">for</span> <span class="n">protocol</span> <span class="ow">in</span> <span class="n">protocols</span><span class="p">:</span>
</span><span><span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="n">protocol</span><span class="p">)</span>
</span><span><span class="go">http</span>
</span><span><span class="go">tcp</span>
</span><span><span class="go">xmpp</span>
</span><span><span class="go">irc</span>
</span></code></pre></div>
<p>Much better.</p>
<h2 id="string-join">String join<a class="headerlink" href="#string-join" title="Permanent link">¶</a></h2>
<p>Since we can use bound methods with <code>map</code> as well, we can pass in methods bound string methods like
<code>str.join</code>:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">planets</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'one'</span><span class="p">:</span> <span class="s1">'un'</span><span class="p">,</span> <span class="s1">'two'</span><span class="p">:</span> <span class="s1">'deux'</span><span class="p">,</span> <span class="s1">'three'</span><span class="p">:</span> <span class="s1">'trois'</span><span class="p">}</span>
</span><span><span class="gp">>>> </span><span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="s1">':'</span><span class="o">.</span><span class="n">join</span><span class="p">,</span> <span class="n">planets</span><span class="o">.</span><span class="n">items</span><span class="p">()))</span>
</span><span><span class="go">['one:un', 'two:deux', 'three:trois']</span>
</span></code></pre></div>
<h2 id="case-against-lambdamap">Case Against <code>lambda</code>+<code>map</code><a class="headerlink" href="#case-against-lambdamap" title="Permanent link">¶</a></h2>
<p>Since <code>map</code> accepts any callable, it can be tempting to use <code>lambda</code> functions inside <code>map</code>. This is
almost always bad taste, and usually, comprehensions (along with <code>zip</code>) offer a more readable
alternative.</p>
<p>Consider the following use of <code>map</code> with <code>lambda</code>:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span> <span class="o">*</span> <span class="mi">2</span><span class="p">,</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">)))</span>
</span><span><span class="go">[0, 2, 4, 6, 8]</span>
</span></code></pre></div>
<p>Now compare that with the same thing done with a comprehension:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="p">[</span><span class="n">x</span> <span class="o">*</span> <span class="mi">2</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">)]</span>
</span><span><span class="go">[0, 2, 4, 6, 8]</span>
</span></code></pre></div>
<p>Now, of course we can use comprehensions even if we are not using <code>lambda</code> in <code>map</code> by just calling
it in the comprehension, true, but in that case, <code>map</code> just looks prettier ;).</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="p">[</span><span class="nb">ord</span><span class="p">(</span><span class="n">c</span><span class="p">)</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="s1">'hello'</span><span class="p">]</span>
</span><span><span class="go">[104, 101, 108, 108, 111]</span>
</span><span><span class="gp">>>> </span><span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="nb">ord</span><span class="p">,</span> <span class="s1">'hello'</span><span class="p">))</span>
</span><span><span class="go">[104, 101, 108, 108, 111]</span>
</span></code></pre></div>
<p>In fact, any call to <code>map</code> can be translated to a comprehension:</p>
<div class="hl"><pre class=content><code><span><span class="nb">map</span><span class="p">(</span><span class="n">function</span><span class="p">,</span> <span class="n">iterable</span><span class="p">,</span> <span class="o">...</span><span class="p">)</span>
</span><span><span class="c1"># same as</span>
</span><span><span class="p">(</span><span class="n">function</span><span class="p">(</span><span class="o">*</span><span class="n">vals</span><span class="p">)</span> <span class="k">for</span> <span class="n">vals</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">iterable</span><span class="p">,</span> <span class="o">...</span><span class="p">))</span>
</span></code></pre></div>
<p>But that doesn’t mean <code>map</code> is not useful. We just have to pick the right option depending on the
need.</p>
<h2 id="conclusion">Conclusion<a class="headerlink" href="#conclusion" title="Permanent link">¶</a></h2>
<p>The <code>map</code> function is powerful builtin, but should be used with care. If you find yourself nesting
several different calls to <code>map</code>, you may want to rethink that strategy since it quickly becomes
unreadable.</p>
<p>But when it produces clear-to-understand code, <code>map</code> can be very useful tool.</p>
<p>Thank you for reading! Do you have any clever examples of using <code>map</code>? Share in the comments!</p>The Jython Pillow Guide2018-01-09T00:00:00+05:302018-01-09T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2018-01-09:/posts/jython-pillow-guide/<p>This is a document with tips and usage details about Jython that I’ve come across. I intend to
document handy features of Python as well as some clever inter-op facilities provided by Jython.</p>
<p>I’m going to assume you’re not a complete beginner to Java and Python languages …</p><p>This is a document with tips and usage details about Jython that I’ve come across. I intend to
document handy features of Python as well as some clever inter-op facilities provided by Jython.</p>
<p>I’m going to assume you’re not a complete beginner to Java and Python languages. If you find
anything off or have a suggestion to add, please do write to me. Thanks!</p>
<h2 id="logging-and-printing">Logging and Printing<a class="headerlink" href="#logging-and-printing" title="Permanent link">¶</a></h2>
<p>When using Apache’s <em>log4j</em>, we can get an instance of a <code>Logger</code> using the API just as we would in
Java:</p>
<div class="hl"><pre class=content><code><span><span class="o">>>></span> <span class="kn">from</span> <span class="nn">org.apache.log4j</span> <span class="kn">import</span> <span class="n">Logger</span>
</span><span><span class="o">>>></span> <span class="n">log</span> <span class="o">=</span> <span class="n">Logger</span><span class="o">.</span><span class="n">getLogger</span><span class="p">(</span><span class="s1">'jython_script'</span><span class="p">)</span>
</span></code></pre></div>
<p>When getting a <code>Logger</code> instance for a module that is imported, a logger with a category specific to
that module can be obtained using the following code:</p>
<div class="hl"><pre class=content><code><span><span class="n">log</span> <span class="o">=</span> <span class="n">Logger</span><span class="o">.</span><span class="n">getLogger</span><span class="p">(</span><span class="vm">__name__</span><span class="p">)</span>
</span></code></pre></div>
<p>The <code>__name__</code> name is a variable containing the current module’s name as a string. <em>Note</em> that
<code>__name__</code> is set to the string <code>'__main__'</code> if the module is run as a script and not imported from
another script. This should be kept in mind when using the above code.</p>
<p>The standard printing functions of Java can be imported into Python and used directly in the
following way:</p>
<div class="hl"><pre class=content><code><span><span class="o">>>></span> <span class="kn">from</span> <span class="nn">java.lang</span> <span class="kn">import</span> <span class="n">System</span>
</span><span><span class="o">>>></span> <span class="n">System</span><span class="o">.</span><span class="n">out</span><span class="o">.</span><span class="n">println</span><span class="p">(</span><span class="s1">'Hola'</span><span class="p">)</span>
</span><span><span class="n">Hola</span>
</span><span><span class="o">>>></span> <span class="n">System</span><span class="o">.</span><span class="n">err</span><span class="o">.</span><span class="n">println</span><span class="p">(</span><span class="s1">'Hello there'</span><span class="p">)</span>
</span><span><span class="n">Hello</span> <span class="n">there</span>
</span><span><span class="o">>>></span> <span class="n">System</span><span class="o">.</span><span class="n">out</span><span class="o">.</span><span class="n">print</span><span class="p">(</span><span class="s1">'Hola</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
</span><span><span class="n">Hola</span>
</span></code></pre></div>
<p>However, it’s usually more convenient to use Python’s <code>print</code> statement to output things to standard
output and error:</p>
<div class="hl"><pre class=content><code><span><span class="nb">print</span> <span class="s1">'Hello world!'</span>
</span></code></pre></div>
<p>Here’s a table illustrating the <code>print</code> statement equivalents of the Java <code>print*</code> functions:</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>Java</th>
<th>Python</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>System.out.println("!")</code></td>
<td><code>print '!'</code></td>
</tr>
<tr>
<td><code>System.out.print("!")</code></td>
<td><code>print '!',</code></td>
</tr>
<tr>
<td><code>System.err.println("!")</code></td>
<td><code>print >> sys.stderr, '!'</code></td>
</tr>
<tr>
<td><code>System.err.print("!")</code></td>
<td><code>print >> sys.stderr, '!',</code></td>
</tr>
</tbody>
</table>
</div>
<h2 id="bean-properties">Bean Properties<a class="headerlink" href="#bean-properties" title="Permanent link">¶</a></h2>
<p>Jython can implicitly call the <code>.get*</code> and <code>.set*</code> methods that are widely used in Java classes to
get and set the values of instance attributes. Here’s an illustration of how this inter-op works:</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>Jython</th>
<th>Java equivalent</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>obj.somePropertyValue</code></td>
<td><code>obj.getSomePropertyValue()</code></td>
</tr>
<tr>
<td><code>obj.somePropertyValue = 123</code></td>
<td><code>obj.setSomePropertyValue(123)</code></td>
</tr>
</tbody>
</table>
</div>
<p>Of course, when such <code>.get*</code> and <code>.set*</code> methods are not available, this falls back gracefully to
trying get/set the property values directly, just as Java would treat those statements.</p>
<h2 id="strings">Strings<a class="headerlink" href="#strings" title="Permanent link">¶</a></h2>
<p>Strings in Java (<em>i.e.,</em> objects of type <code>java.lang.String</code>) are converted to <code>unicode</code> objects when
passed in to Python world. Whereas <code>str</code> and <code>unicode</code> objects in Python are converted to
<code>java.lang.String</code> instances when passed in to Java world. This conversion is seamless and we
usually don’t have to worry about it.</p>
<p>However, if needed, we can explicitly create an instance of <code>java.lang.String</code> from a <code>unicode</code>
object in Python:</p>
<div class="hl"><pre class=content><code><span><span class="o">>>></span> <span class="kn">from</span> <span class="nn">java.lang</span> <span class="kn">import</span> <span class="n">String</span>
</span><span><span class="o">>>></span> <span class="n">greeting</span> <span class="o">=</span> <span class="n">String</span><span class="p">(</span><span class="s1">'Hello'</span><span class="p">)</span>
</span><span><span class="o">>>></span> <span class="n">greeting</span>
</span><span><span class="n">Hello</span>
</span><span><span class="o">>>></span> <span class="nb">type</span><span class="p">(</span><span class="n">greeting</span><span class="p">)</span>
</span><span><span class="o"><</span><span class="nb">type</span> <span class="s1">'java.lang.String'</span><span class="o">></span>
</span></code></pre></div>
<p>String formatting using <code>%</code> operator in Python cannot be applied to Java <code>String</code> objects. They have
to converted to <code>str</code> or <code>unicode</code> first.</p>
<h2 id="maps-as-dictionaries">Maps as Dictionaries<a class="headerlink" href="#maps-as-dictionaries" title="Permanent link">¶</a></h2>
<p>For the purposes of the following examples, let’s work with the following
<a href="https://docs.oracle.com/javase/8/docs/api/java/util/Map.html" rel="noopener noreferrer" target="_blank"><code>Map</code></a>:</p>
<div class="hl"><pre class=content><code><span><span class="n">java</span><span class="p">.</span><span class="na">util</span><span class="p">.</span><span class="na">Map</span><span class="o"><</span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">Integer</span><span class="o">></span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">java</span><span class="p">.</span><span class="na">util</span><span class="p">.</span><span class="na">HashMap</span><span class="o"><></span><span class="p">();</span>
</span><span><span class="n">data</span><span class="p">.</span><span class="na">put</span><span class="p">(</span><span class="s">"a"</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
</span><span><span class="n">data</span><span class="p">.</span><span class="na">put</span><span class="p">(</span><span class="s">"b"</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">);</span>
</span><span><span class="n">data</span><span class="p">.</span><span class="na">put</span><span class="p">(</span><span class="s">"c"</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span><span class="p">);</span>
</span></code></pre></div>
<p><code>Map</code>s support the <em>getitem</em> syntax very well so it is usually convenient to think of them as
python-style dictionaries. Here’s an example:</p>
<div class="hl"><pre class=content><code><span><span class="o">>>></span> <span class="nb">print</span> <span class="n">data</span><span class="p">[</span><span class="s1">'a'</span><span class="p">]</span> <span class="c1"># data.get("a")</span>
</span><span><span class="mi">1</span>
</span><span><span class="o">>>></span> <span class="nb">print</span> <span class="n">data</span><span class="p">[</span><span class="s1">'b'</span><span class="p">]</span> <span class="c1"># data.get("b")</span>
</span><span><span class="mi">2</span>
</span><span><span class="o">>>></span> <span class="n">data</span><span class="p">[</span><span class="s1">'d'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">4</span> <span class="c1"># data.put("d", 4)</span>
</span><span><span class="o">>>></span> <span class="n">data</span><span class="p">[</span><span class="s1">'d'</span><span class="p">]</span> <span class="c1"># data.get("d")</span>
</span><span><span class="mi">4</span>
</span><span><span class="o">>>></span> <span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">)</span> <span class="c1"># data.size()</span>
</span><span><span class="mi">4</span>
</span><span><span class="o">>>></span> <span class="s1">'c'</span> <span class="ow">in</span> <span class="n">data</span> <span class="c1"># data.containsKey("c")</span>
</span><span><span class="kc">True</span>
</span><span><span class="o">>>></span> <span class="k">del</span> <span class="n">data</span><span class="p">[</span><span class="s1">'c'</span><span class="p">]</span> <span class="c1"># data.remove("c")</span>
</span><span><span class="o">>>></span> <span class="s1">'c'</span> <span class="ow">in</span> <span class="n">data</span> <span class="c1"># data.containsKey("c")</span>
</span><span><span class="kc">False</span>
</span><span><span class="o">>>></span> <span class="n">data</span>
</span><span><span class="p">{</span><span class="n">a</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">b</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="mi">4</span><span class="p">}</span>
</span><span><span class="o">>>></span> <span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">)</span> <span class="c1"># data.size()</span>
</span><span><span class="mi">3</span>
</span></code></pre></div>
<p>Although this resembles the usage of a traditional python dictionary, the methods you’d expect in a
dictionary are not all available. This is a <code>Map</code> object after all and it has the methods of the
<code>Map</code> class. However, it is easy to get see the parallels among some of the most used methods.</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th><code>dict</code> method</th>
<th><code>Map</code> method</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>.keys</code></td>
<td><code>.keySet</code></td>
</tr>
<tr>
<td><code>.values</code></td>
<td><code>.values</code></td>
</tr>
<tr>
<td><code>.clear</code></td>
<td><code>.clear</code></td>
</tr>
<tr>
<td><code>.items</code> (gives 2-tuples)</td>
<td><code>.entrySet</code> (gives <code>Entry</code> objects with <code>.key</code> and <code>.value</code>)</td>
</tr>
<tr>
<td><code>.update</code></td>
<td><code>.putAll</code> (accepts <code>dict</code> as well as a <code>Map</code>)</td>
</tr>
</tbody>
</table>
</div>
<p>The <code>dict</code> builtin can be called on the <code>Map</code> object to get a python-style dictionary, if needed.
Additionally, just like a python dictionary, calling <code>list</code> (or <code>set</code>) on the <code>Map</code> object gives a
<code>list</code> (or <code>set</code>) of the <em>keys</em> in the <code>Map</code>.</p>
<p>Using <code>for</code> loops to iterate over <code>Map</code>s yields the keys in the <code>Map</code>, which is consistent with how
<code>for</code> loops work with python dictionaries.</p>
<div class="hl"><pre class=content><code><span><span class="k">for</span> <span class="n">key</span> <span class="ow">in</span> <span class="n">data</span><span class="p">:</span>
</span><span> <span class="nb">print</span> <span class="n">key</span><span class="p">,</span> <span class="n">data</span><span class="p">[</span><span class="n">key</span><span class="p">]</span>
</span></code></pre></div>
<p>Prints the following:</p>
<div class="hl"><pre class=content><code><span>a 1
</span><span>b 2
</span><span>d 4
</span></code></pre></div>
<p>In python, the <code>.items</code> method returns each entry as a <code>tuple</code> which lets us write the for loop like
the following:</p>
<div class="hl"><pre class=content><code><span><span class="c1"># !!! Only works if `data` is a python-style dictionary, not if it is a `Map`.</span>
</span><span><span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">data</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
</span><span> <span class="nb">print</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span>
</span></code></pre></div>
<p>But unfortunately, since <code>Map</code> doesn’t have the <code>.items</code> method, this is not possible. However, we
can use the <code>.entrySet</code> method to construct something <em>slightly</em> similar.</p>
<div class="hl"><pre class=content><code><span><span class="k">for</span> <span class="n">entry</span> <span class="ow">in</span> <span class="n">data</span><span class="o">.</span><span class="n">entrySet</span><span class="p">():</span>
</span><span> <span class="nb">print</span> <span class="n">entry</span><span class="o">.</span><span class="n">key</span><span class="p">,</span> <span class="n">entry</span><span class="o">.</span><span class="n">value</span>
</span></code></pre></div>
<p>To iterate over the values of a <code>Map</code>, since the method is called <code>.values</code> in both <code>dict</code> and
<code>Map</code>, the same piece of code would work with any object.</p>
<div class="hl"><pre class=content><code><span><span class="k">for</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">data</span><span class="o">.</span><span class="n">values</span><span class="p">():</span>
</span><span> <span class="nb">print</span> <span class="n">value</span>
</span></code></pre></div>
<p>Empty <code>Map</code> objects are treated as <code>False</code> in boolean contexts, just as with python’s dictionaries.</p>
<h2 id="collections">Collections<a class="headerlink" href="#collections" title="Permanent link">¶</a></h2>
<p>The two main collection types in Python are <code>list</code> and <code>set</code>. The equivalents in java are the
interfaces <a href="https://docs.oracle.com/javase/8/docs/api/java/util/List.html" rel="noopener noreferrer" target="_blank"><code>List</code></a> and
<a href="https://docs.oracle.com/javase/8/docs/api/java/util/Set.html" rel="noopener noreferrer" target="_blank"><code>Set</code></a>. Let’s prepare some data for
our examples.</p>
<div class="hl"><pre class=content><code><span><span class="n">java</span><span class="p">.</span><span class="na">util</span><span class="p">.</span><span class="na">List</span><span class="o"><</span><span class="n">String</span><span class="o">></span><span class="w"> </span><span class="n">planets</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">java</span><span class="p">.</span><span class="na">util</span><span class="p">.</span><span class="na">ArrayList</span><span class="o"><></span><span class="p">();</span>
</span><span><span class="n">planets</span><span class="p">.</span><span class="na">add</span><span class="p">(</span><span class="s">"Mercury"</span><span class="p">);</span>
</span><span><span class="n">planets</span><span class="p">.</span><span class="na">add</span><span class="p">(</span><span class="s">"Venus"</span><span class="p">);</span>
</span><span><span class="n">planets</span><span class="p">.</span><span class="na">add</span><span class="p">(</span><span class="s">"Earth"</span><span class="p">);</span>
</span><span>
</span><span><span class="n">java</span><span class="p">.</span><span class="na">util</span><span class="p">.</span><span class="na">Set</span><span class="o"><</span><span class="n">String</span><span class="o">></span><span class="w"> </span><span class="n">colors</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">java</span><span class="p">.</span><span class="na">util</span><span class="p">.</span><span class="na">HashSet</span><span class="o"><></span><span class="p">();</span>
</span><span><span class="n">colors</span><span class="p">.</span><span class="na">add</span><span class="p">(</span><span class="s">"White"</span><span class="p">);</span>
</span><span><span class="n">colors</span><span class="p">.</span><span class="na">add</span><span class="p">(</span><span class="s">"Black"</span><span class="p">);</span>
</span><span><span class="n">colors</span><span class="p">.</span><span class="na">add</span><span class="p">(</span><span class="s">"Red"</span><span class="p">);</span>
</span><span><span class="n">colors</span><span class="p">.</span><span class="na">add</span><span class="p">(</span><span class="s">"Green"</span><span class="p">);</span>
</span><span><span class="n">colors</span><span class="p">.</span><span class="na">add</span><span class="p">(</span><span class="s">"Blue"</span><span class="p">);</span>
</span></code></pre></div>
<p>The <em>getitem</em> syntax can be used with <code>List</code>s seamlessly:</p>
<div class="hl"><pre class=content><code><span><span class="o">>>></span> <span class="n">planets</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
</span><span><span class="sa">u</span><span class="s1">'Mercury'</span>
</span><span><span class="o">>>></span> <span class="n">planets</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
</span><span><span class="sa">u</span><span class="s1">'Venus'</span>
</span></code></pre></div>
<p>The slicing syntax, returns <code>List</code>s of the same type, not python-style <code>list</code>s.</p>
<div class="hl"><pre class=content><code><span><span class="o">>>></span> <span class="n">planets</span><span class="p">[:</span><span class="mi">2</span><span class="p">]</span>
</span><span><span class="p">[</span><span class="n">Mercury</span><span class="p">,</span> <span class="n">Venus</span><span class="p">]</span>
</span><span><span class="o">>>></span> <span class="nb">type</span><span class="p">(</span><span class="n">_</span><span class="p">)</span> <span class="c1"># `_` is a variable set to the return value of last expression.</span>
</span><span><span class="o"><</span><span class="nb">type</span> <span class="s1">'java.util.ArrayList'</span><span class="o">></span>
</span><span><span class="o">>>></span> <span class="n">planets</span><span class="p">[::</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
</span><span><span class="p">[</span><span class="n">Earth</span><span class="p">,</span> <span class="n">Venus</span><span class="p">,</span> <span class="n">Mercury</span><span class="p">]</span>
</span><span><span class="o">>>></span> <span class="nb">type</span><span class="p">(</span><span class="n">_</span><span class="p">)</span>
</span><span><span class="o"><</span><span class="nb">type</span> <span class="s1">'java.util.ArrayList'</span><span class="o">></span>
</span></code></pre></div>
<p>However, the <em>getitem</em> syntax is not supported for <code>Set</code>s as it doesn’t make sense there since
<code>Set</code>s are unordered collections. But the operator support available for <code>set</code>s in python are
available with Java <code>Set</code> objects as well.</p>
<div class="hl"><pre class=content><code><span><span class="o">>>></span> <span class="s1">'Red'</span> <span class="ow">in</span> <span class="n">colors</span>
</span><span><span class="kc">True</span>
</span><span><span class="o">>>></span> <span class="nb">len</span><span class="p">(</span><span class="n">colors</span><span class="p">)</span>
</span><span><span class="mi">5</span>
</span></code></pre></div>
<p>The <code>for</code> loop can be used on any
<a href="https://docs.oracle.com/javase/8/docs/api/java/util/Collection.html" rel="noopener noreferrer" target="_blank"><code>Collection</code></a> type objects to
iterate over the object’s contents.</p>
<div class="hl"><pre class=content><code><span><span class="o">>>></span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">planets</span><span class="p">:</span>
</span><span><span class="o">...</span> <span class="nb">print</span> <span class="n">x</span>
</span><span><span class="o">...</span>
</span><span><span class="n">Mercury</span>
</span><span><span class="n">Venus</span>
</span><span><span class="n">Earth</span>
</span><span><span class="o">>>></span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">planets</span><span class="p">):</span>
</span><span><span class="o">...</span> <span class="nb">print</span> <span class="n">x</span>
</span><span><span class="o">...</span>
</span><span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="sa">u</span><span class="s1">'Mercury'</span><span class="p">)</span>
</span><span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="sa">u</span><span class="s1">'Venus'</span><span class="p">)</span>
</span><span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="sa">u</span><span class="s1">'Earth'</span><span class="p">)</span>
</span></code></pre></div>
<p>Here’s equivalents for some of the methods available in Java’s <code>Collection</code>s and Python’s collection
types.</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>Java</th>
<th>Jython</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>Collection.add</code></td>
<td><code>list.append</code> / <code>set.add</code></td>
</tr>
<tr>
<td><code>Collection.addAll</code></td>
<td><code>list.extend</code> / <code>set.update</code> (Prefer <code>list + list</code> or <code>set.union</code>)</td>
</tr>
<tr>
<td><code>Collection.contains</code></td>
<td><code>in list</code> or <code>in set</code></td>
</tr>
<tr>
<td><code>Collection.isEmpty</code></td>
<td><code>bool(list)</code> or <code>bool(set)</code> (Can be used directly in a boolean context)</td>
</tr>
<tr>
<td><code>Collection.size</code></td>
<td><code>len(list)</code> or <code>len(set)</code></td>
</tr>
</tbody>
</table>
</div>
<p>Empty <code>Collection</code>s are treated as <code>False</code> in boolean contexts, just as with python’s collections.</p>
<h3 id="java-arrays">Java Arrays<a class="headerlink" href="#java-arrays" title="Permanent link">¶</a></h3>
<p>Just as Java’s <code>List</code> is mirrored in Python with <code>list</code>, Java’s arrays are mirrored using the array
structure available in Jython’s <a href="http://www.jython.org/docs/library/array.html" rel="noopener noreferrer" target="_blank"><code>array</code></a> module.
That official documentation is quite exhaustive on this topic, so I suggest going over it to get an
idea of handling arrays in Jython.</p>
<h2 id="the-iteration-protocol">The Iteration Protocol<a class="headerlink" href="#the-iteration-protocol" title="Permanent link">¶</a></h2>
<p>Java’s <a href="https://docs.oracle.com/javase/8/docs/api/java/util/Iterator.html" rel="noopener noreferrer" target="_blank"><code>Iterator</code></a> style
iteration is supported by Jython’s <code>for</code> statements. For example, consider the following Java
<code>Iterator</code> that’s trying to emulate a small fraction of Python’s <code>range</code> function:</p>
<div class="hl"><pre class=content><code><span><span class="kn">package</span><span class="w"> </span><span class="nn">ssk.experiments</span><span class="p">;</span>
</span><span><span class="kn">import</span><span class="w"> </span><span class="nn">java.util.Iterator</span><span class="p">;</span>
</span><span>
</span><span><span class="kd">public</span><span class="w"> </span><span class="kd">class</span> <span class="nc">RangeIterator</span><span class="w"> </span><span class="kd">implements</span><span class="w"> </span><span class="n">Iterator</span><span class="o"><</span><span class="n">Integer</span><span class="o">></span><span class="w"> </span><span class="p">{</span>
</span><span><span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">Integer</span><span class="w"> </span><span class="n">current</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">max</span><span class="p">;</span>
</span><span><span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="nf">RangeIterator</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">max</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="na">max</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">max</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
</span><span><span class="w"> </span><span class="nd">@Override</span>
</span><span><span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kt">boolean</span><span class="w"> </span><span class="nf">hasNext</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">current</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">max</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
</span><span><span class="w"> </span><span class="nd">@Override</span>
</span><span><span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="n">Integer</span><span class="w"> </span><span class="nf">next</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">current</span><span class="o">++</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
</span><span><span class="p">}</span>
</span></code></pre></div>
<p>Since classes are instantiated without a <code>new</code> keyword in Python, combined with the fact that
Jython’s <code>for</code> statement supports Java’s <code>Iterator</code>s, we can use the above in the following way:</p>
<div class="hl"><pre class=content><code><span><span class="kn">from</span> <span class="nn">ssk.experiments</span> <span class="kn">import</span> <span class="n">RangeIterator</span>
</span><span>
</span><span>
</span><span><span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="n">RangeIterator</span><span class="p">(</span><span class="mi">5</span><span class="p">):</span>
</span><span> <span class="nb">print</span> <span class="n">n</span>
</span></code></pre></div>
<p>This gives the following output:</p>
<div class="hl"><pre class=content><code><span>0
</span><span>1
</span><span>2
</span><span>3
</span><span>4
</span></code></pre></div>
<p>Since Jython’s <code>for</code> statement supports iterating over Java’s
<a href="https://docs.oracle.com/javase/8/docs/api/java/util/Enumeration.html" rel="noopener noreferrer" target="_blank"><code>Enumeration</code></a> type, the
above same <code>for</code> loop would work with a <code>RangeEnumeration</code> class as defined below:</p>
<div class="hl"><pre class=content><code><span><span class="kn">package</span><span class="w"> </span><span class="nn">ssk.experiments</span><span class="p">;</span>
</span><span><span class="kn">import</span><span class="w"> </span><span class="nn">java.util.Enumeration</span><span class="p">;</span>
</span><span>
</span><span><span class="kd">public</span><span class="w"> </span><span class="kd">class</span> <span class="nc">RangeEnumeration</span><span class="w"> </span><span class="kd">implements</span><span class="w"> </span><span class="n">Enumeration</span><span class="o"><</span><span class="n">Integer</span><span class="o">></span><span class="w"> </span><span class="p">{</span>
</span><span><span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">Integer</span><span class="w"> </span><span class="n">current</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">max</span><span class="p">;</span>
</span><span><span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="nf">RangeEnumeration</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">max</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="na">max</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">max</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
</span><span><span class="w"> </span><span class="nd">@Override</span>
</span><span><span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kt">boolean</span><span class="w"> </span><span class="nf">hasMoreElements</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">current</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">max</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
</span><span><span class="w"> </span><span class="nd">@Override</span>
</span><span><span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="n">Integer</span><span class="w"> </span><span class="nf">nextElement</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">current</span><span class="o">++</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
</span><span><span class="p">}</span>
</span></code></pre></div>
<p>Jython seamlessly handles the getting of an instance of an <code>Iterator</code> from a Java
<a href="https://docs.oracle.com/javase/8/docs/api/java/lang/Iterable.html" rel="noopener noreferrer" target="_blank"><code>Iterable</code></a>. This is actually
how the <code>for</code> statement works with the <code>List</code> and <code>Set</code> collections discussed earlier (<code>Collection</code>
is a sub-interface of <code>Iterable</code>). </p>
<h2 id="patching-java-classes">Patching Java Classes<a class="headerlink" href="#patching-java-classes" title="Permanent link">¶</a></h2>
<p>In Python, new methods and attributes can be added to existing classes. This comes from the dynamic
nature of the programming language and the runtime. The JVM is also a dynamic runtime, but the Java
language doesn’t allow us to modify existing classes. This is where Jython comes in. Jython lets us
add and override methods on existing Java classes. Although this is seldom needed, this can
illustrate the extent of Jython’s integration with the JVM.</p>
<p>Here’s a Java class:</p>
<div class="hl"><pre class=content><code><span><span class="kn">package</span><span class="w"> </span><span class="nn">ssk.experiments</span><span class="p">;</span>
</span><span><span class="kn">import</span><span class="w"> </span><span class="nn">java.util.List</span><span class="p">;</span>
</span><span>
</span><span><span class="kd">public</span><span class="w"> </span><span class="kd">class</span> <span class="nc">Country</span><span class="w"> </span><span class="p">{</span>
</span><span><span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">name</span><span class="p">;</span>
</span><span><span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="nf">Country</span><span class="p">(</span><span class="n">String</span><span class="w"> </span><span class="n">name</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">name</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
</span><span><span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="nf">getName</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">name</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
</span><span><span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">setName</span><span class="p">(</span><span class="n">String</span><span class="w"> </span><span class="n">name</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">name</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
</span><span><span class="p">}</span>
</span></code></pre></div>
<p>There’s nothing fancy with the above class. It’s a regular class with one property with a <code>.get</code> and
<code>.set</code> methods. Now, let’s add a new method to this class.</p>
<div class="hl"><pre class=content><code><span><span class="kn">from</span> <span class="nn">ssk.experiments</span> <span class="kn">import</span> <span class="n">Country</span>
</span><span>
</span><span>
</span><span><span class="k">def</span> <span class="nf">upcase</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span><span> <span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">name</span><span class="o">.</span><span class="n">upper</span><span class="p">()</span>
</span><span>
</span><span>
</span><span><span class="n">Country</span><span class="o">.</span><span class="n">upcase</span> <span class="o">=</span> <span class="n">upcase</span>
</span><span>
</span><span><span class="c1"># Create a `Country` object and call `upper_name` method.</span>
</span><span><span class="n">largest_country</span> <span class="o">=</span> <span class="n">Country</span><span class="p">(</span><span class="s1">'Russia'</span><span class="p">)</span>
</span><span><span class="n">largest_country</span><span class="o">.</span><span class="n">upcase</span><span class="p">()</span>
</span><span><span class="nb">print</span> <span class="n">largest_country</span><span class="o">.</span><span class="n">name</span>
</span></code></pre></div>
<p>This would print <code>RUSSIA</code>, as expected.</p>
<p>Note that this is an advanced feature and should be used with caution. In almost all cases, it is
probably a better idea to modify the original Java class definition directly. But when that is not
an option, creating a simple Python function that works with these objects should be considered.
Modifying existing classes should only be used as a last resort.</p>
<h2 id="operator-overloading">Operator Overloading<a class="headerlink" href="#operator-overloading" title="Permanent link">¶</a></h2>
<p>One nice and practical case for adding methods on existing Java classes is to leverage Python’s
support for operator overloading with Java classes. One good example for this is with the
<code>BigDecimal</code> class. Mathematical operations on objects of <code>BigDecimal</code> are provided as individual
methods like <code>.add</code>, <code>.subtract</code> <em>etc</em>. We can add operator support (in Jython) for these objects
by adding the appropriate methods to the <code>BigDecimal</code> class.</p>
<p>For instance, here’s how we can add support for the <code>+</code> operator:</p>
<div class="hl"><pre class=content><code><span><span class="kn">from</span> <span class="nn">java.math</span> <span class="kn">import</span> <span class="n">BigDecimal</span>
</span><span>
</span><span><span class="n">BigDecimal</span><span class="o">.</span><span class="fm">__add__</span> <span class="o">=</span> <span class="k">lambda</span> <span class="bp">self</span><span class="p">,</span> <span class="n">other</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">other</span><span class="p">)</span>
</span><span>
</span><span><span class="nb">print</span> <span class="n">BigDecimal</span><span class="p">(</span><span class="mi">42</span><span class="p">)</span> <span class="o">+</span> <span class="n">BigDecimal</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
</span></code></pre></div>
<p>This would print <code>52</code>, as expected. More methods can be added to support all the mathematical
operators such as <code>__sub__</code> for subtraction and <code>__mul__</code> for multiplication <em>etc</em>. The full list of
such method names can be found on the official <a href="http://www.jython.org/docs/reference/datamodel.html#emulating-numeric-types" rel="noopener noreferrer" target="_blank">data model documentation
page</a>.</p>
<h2 id="conclusion">Conclusion<a class="headerlink" href="#conclusion" title="Permanent link">¶</a></h2>
<p>This is not intended to be an exhaustive guide to what Jython can do. I hoped to give you a taste of
how well Jython handles inter-op with Java and hopefully I’ve helped you write better Python - Java
inter-op code. Thank you and any suggestions and feedback are very welcome.</p>The Python Dictionary2017-09-29T00:00:00+05:302017-09-29T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2017-09-29:/posts/the-python-dictionary/<p>The Python <a href="https://docs.python.org/3.6/tutorial/datastructures.html#dictionaries" rel="noopener noreferrer" target="_blank">Dictionary</a> is a key–value style data structure that is tightly integrated with the
language syntax and semantics. Understanding them well can help us use them better and investigate
subtle problems more efficiently.</p>
<p>This is my attempt to document this topic in more depth. Though I included a …</p><p>The Python <a href="https://docs.python.org/3.6/tutorial/datastructures.html#dictionaries" rel="noopener noreferrer" target="_blank">Dictionary</a> is a key–value style data structure that is tightly integrated with the
language syntax and semantics. Understanding them well can help us use them better and investigate
subtle problems more efficiently.</p>
<p>This is my attempt to document this topic in more depth. Though I included a small section about the
syntax and basic usage of dictionaries, it’ll be helpful if you have some beginner–intermediate
level experience with Python.</p>
<p>This article is written for Python 3.6 installed via Anaconda on Xubuntu. Here’s the platform
details:</p>
<div class="hl"><pre class=content><code><span>$ python -V
</span><span>Python 3.6.1 :: Anaconda custom (64-bit)
</span><span>$ uname -isro
</span><span>Linux 4.10.0-33-generic x86_64 GNU/Linux
</span></code></pre></div>
<p>Note: This is not intended as a substitute for official documentation. The official documentation is
a reference and there will be some overlap. This document is intended as a supplement that covers
more depth and practical nuances.</p>
<div class="toc"><span class="toctitle">Table of Contents</span><ul>
<li><a href="#introduction">Introduction</a></li>
<li><a href="#usage">Usage</a><ul>
<li><a href="#syntax">Syntax</a></li>
<li><a href="#api">API</a></li>
</ul>
</li>
<li><a href="#contents">Contents</a><ul>
<li><a href="#key-types">Key Types</a></li>
<li><a href="#retrieving-keys">Retrieving Keys</a></li>
<li><a href="#using-tuples-for-keys">Using Tuples for Keys</a></li>
<li><a href="#retrieving-values">Retrieving Values</a></li>
<li><a href="#items-collection">Items Collection</a></li>
</ul>
</li>
<li><a href="#typing">Typing</a></li>
<li><a href="#creating-dictionaries">Creating Dictionaries</a><ul>
<li><a href="#calling-dict">Calling dict</a></li>
<li><a href="#comprehensions">Comprehensions</a></li>
</ul>
</li>
<li><a href="#public-appearance">Public Appearance</a><ul>
<li><a href="#keyword-arguments">Keyword Arguments</a></li>
<li><a href="#namespaces">Namespaces</a></li>
</ul>
</li>
<li><a href="#serialization">Serialization</a><ul>
<li><a href="#json">JSON</a></li>
<li><a href="#pickling">Pickling</a></li>
</ul>
</li>
<li><a href="#the-item-syntax">The Item Syntax</a></li>
<li><a href="#flavors">Flavors</a><ul>
<li><a href="#the-ordereddict">The OrderedDict</a></li>
<li><a href="#the-defaultdict">The defaultdict</a></li>
<li><a href="#the-chainmap">The ChainMap</a></li>
<li><a href="#the-counter">The Counter</a></li>
<li><a href="#custom-flavor">Custom Flavor</a></li>
</ul>
</li>
<li><a href="#disassembling">Disassembling</a></li>
<li><a href="#conclusion">Conclusion</a></li>
<li><a href="#references">References</a></li>
</ul>
</div>
<h2 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">¶</a></h2>
<p>Dictionaries (type <code>dict</code>) are a very powerful data structure, not just in Python. They are present
in almost every modern high level language, sometimes called maps, hashes or associative arrays.
Python’s syntax for dictionaries inspired the syntax of the JSON serialization format.</p>
<p>Dictionaries are a fundamental part of Python language and integrate tightly with the semantics and
APIs of the standard library. This can be seen in the fact that we have a special syntax just to
create these data structures.</p>
<h2 id="usage">Usage<a class="headerlink" href="#usage" title="Permanent link">¶</a></h2>
<h3 id="syntax">Syntax<a class="headerlink" href="#syntax" title="Permanent link">¶</a></h3>
<p>As a quick primer, here’s the syntax for defining a dictionary:</p>
<div class="hl"><pre class=content><code><span><span class="n">country_currencies</span> <span class="o">=</span> <span class="p">{</span>
</span><span> <span class="s1">'India'</span><span class="p">:</span> <span class="s1">'Rupee'</span><span class="p">,</span>
</span><span> <span class="s1">'Russia'</span><span class="p">:</span> <span class="s1">'Ruble'</span><span class="p">,</span>
</span><span> <span class="s1">'USA'</span><span class="p">:</span> <span class="s1">'Dollar'</span><span class="p">,</span>
</span><span> <span class="s1">'Japan'</span><span class="p">:</span> <span class="s1">'Yen'</span><span class="p">,</span>
</span><span><span class="p">}</span>
</span></code></pre></div>
<h3 id="api">API<a class="headerlink" href="#api" title="Permanent link">¶</a></h3>
<p>Again, we quickly run down the common operations on dictionaries.</p>
<div class="hl"><pre class=content><code><span><span class="c1"># Get the value of a key.</span>
</span><span><span class="n">indian_currency</span> <span class="o">=</span> <span class="n">country_currencies</span><span class="p">[</span><span class="s1">'India'</span><span class="p">]</span>
</span><span>
</span><span><span class="c1"># Set the value of a key.</span>
</span><span><span class="n">country_currencies</span><span class="p">[</span><span class="s1">'France'</span><span class="p">]</span> <span class="o">=</span> <span class="s1">'Euro'</span>
</span><span>
</span><span><span class="c1"># Delete a key.</span>
</span><span><span class="k">del</span> <span class="n">country_currencies</span><span class="p">[</span><span class="s1">'USA'</span><span class="p">]</span>
</span><span>
</span><span><span class="c1"># Check for presence of a key.</span>
</span><span><span class="s1">'Russia'</span> <span class="ow">in</span> <span class="n">country_currencies</span>
</span><span>
</span><span><span class="c1"># Get if key present, otherwise return `None`.</span>
</span><span><span class="c1"># (Takes a second parameter which is returned when key is missing).</span>
</span><span><span class="n">country_currencies</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'USA'</span><span class="p">)</span>
</span><span>
</span><span><span class="c1"># Set only if the key is not already present.</span>
</span><span><span class="n">country_currencies</span><span class="o">.</span><span class="n">setdefault</span><span class="p">(</span><span class="s1">'France'</span><span class="p">,</span> <span class="s1">'Franc'</span><span class="p">)</span>
</span></code></pre></div>
<h2 id="contents">Contents<a class="headerlink" href="#contents" title="Permanent link">¶</a></h2>
<p>The contents of dictionaries are made up two components. The keys and the values. The keys form the
index using which we can retrieve the values. Each key uniquely identifies a value within the
dictionary.</p>
<h3 id="key-types">Key Types<a class="headerlink" href="#key-types" title="Permanent link">¶</a></h3>
<p>The keys form the index of the dictionary. In most practical cases, keys tend to be strings. Tuples
are often used as well. In fact, values of any immutable, hashable types can be used as keys.</p>
<p>So, what is a hashable type? The official documentation of the <a href="https://docs.python.org/3/reference/datamodel.html#object.__hash__" rel="noopener noreferrer" target="_blank"><code>__hash__</code> method</a> gives the
full detail of what it is and what are considered hashable. Simply put, if passing an object to the
<code>hash</code> builtin function doesn’t raise an exception, the object is hashable and <em>can</em> be used as a
key in a dictionary.</p>
<p>However, in practice, we should avoid using mutable objects as keys (even if they are hashable).
Especially, if mutation changes the hash of the object.</p>
<p>For example, consider the following <code>User</code> class.</p>
<div class="hl"><pre class=content><code><span><span class="k">class</span> <span class="nc">User</span><span class="p">:</span>
</span><span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">first_name</span><span class="p">,</span> <span class="n">last_name</span><span class="p">):</span>
</span><span> <span class="bp">self</span><span class="o">.</span><span class="n">first_name</span> <span class="o">=</span> <span class="n">first_name</span>
</span><span> <span class="bp">self</span><span class="o">.</span><span class="n">last_name</span> <span class="o">=</span> <span class="n">last_name</span>
</span></code></pre></div>
<p>Let’s inspect the hash values of <code>User</code> objects.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">ned</span> <span class="o">=</span> <span class="n">User</span><span class="p">(</span><span class="s1">'Ned'</span><span class="p">,</span> <span class="s1">'Stark'</span><span class="p">)</span>
</span><span><span class="gp">>>> </span><span class="nb">hash</span><span class="p">(</span><span class="n">ned</span><span class="p">)</span>
</span><span><span class="go">8784834659087</span>
</span><span><span class="gp">>>> </span><span class="n">ned</span><span class="o">.</span><span class="n">first_name</span> <span class="o">=</span> <span class="s1">'Robb'</span>
</span><span><span class="gp">>>> </span><span class="nb">hash</span><span class="p">(</span><span class="n">ned</span><span class="p">)</span>
</span><span><span class="go">8784834659087</span>
</span></code></pre></div>
<p class="note">If you try the above code, you might see a different number. That’s because Python default hashing
algorithm includes a random salt.</p>
<p>As seen above, the hash value did not change even though the object was modified. These <code>User</code>
objects can be used as keys for a dictionary since they meet the requirement, but it should be kept
in mind that they are mutable.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">ned</span> <span class="o">=</span> <span class="n">User</span><span class="p">(</span><span class="s1">'Ned'</span><span class="p">,</span> <span class="s1">'Stark'</span><span class="p">)</span>
</span><span><span class="gp">>>> </span><span class="n">d</span> <span class="o">=</span> <span class="p">{</span><span class="n">ned</span><span class="p">:</span> <span class="mi">123</span><span class="p">}</span>
</span><span><span class="gp">>>> </span><span class="n">d</span><span class="p">[</span><span class="n">ned</span><span class="p">]</span>
</span><span><span class="go">123</span>
</span><span><span class="gp">>>> </span><span class="n">ned</span><span class="o">.</span><span class="n">first_name</span> <span class="o">=</span> <span class="s1">'Robb'</span>
</span><span><span class="gp">>>> </span><span class="n">d</span><span class="p">[</span><span class="n">ned</span><span class="p">]</span>
</span><span><span class="go">123</span>
</span></code></pre></div>
<p>If that doesn’t seem confusing, try this:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">robb</span> <span class="o">=</span> <span class="n">ned</span>
</span><span><span class="gp">>>> </span><span class="n">ned</span> <span class="o">=</span> <span class="n">User</span><span class="p">(</span><span class="s1">'Ned'</span><span class="p">,</span> <span class="s1">'Start'</span><span class="p">)</span>
</span><span><span class="gp">>>> </span><span class="n">robb</span><span class="o">.</span><span class="n">first_name</span>
</span><span><span class="go">'Robb'</span>
</span><span><span class="gp">>>> </span><span class="n">robb</span> <span class="ow">in</span> <span class="n">d</span> <span class="c1"># Robb isn't in our dictionary!</span>
</span><span><span class="go">True</span>
</span><span><span class="gp">>>> </span><span class="n">ned</span><span class="o">.</span><span class="n">first_name</span>
</span><span><span class="go">'Ned'</span>
</span><span><span class="gp">>>> </span><span class="n">ned</span> <span class="ow">in</span> <span class="n">d</span> <span class="c1"># We gave Ned Stark a value right?</span>
</span><span><span class="go">False</span>
</span></code></pre></div>
<p>This can quickly cause headaches and hard-to-find problems.</p>
<p>To <em>fix</em> this, if someone later decides to customize the hashing of this class by adding the
following method:</p>
<div class="hl"><pre class=content><code><span> <span class="k">def</span> <span class="fm">__hash__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span><span> <span class="k">return</span> <span class="nb">hash</span><span class="p">((</span><span class="bp">self</span><span class="o">.</span><span class="n">first_name</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">last_name</span><span class="p">))</span>
</span></code></pre></div>
<p>Now, the hash of the object changes when we change the <code>first_name</code>.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">ned</span> <span class="o">=</span> <span class="n">User</span><span class="p">(</span><span class="s1">'Ned'</span><span class="p">,</span> <span class="s1">'Stark'</span><span class="p">)</span>
</span><span><span class="gp">>>> </span><span class="nb">hash</span><span class="p">(</span><span class="n">ned</span><span class="p">)</span>
</span><span><span class="go">4091961891370636651</span>
</span><span><span class="gp">>>> </span><span class="n">ned</span><span class="o">.</span><span class="n">first_name</span> <span class="o">=</span> <span class="s1">'Robb'</span>
</span><span><span class="gp">>>> </span><span class="nb">hash</span><span class="p">(</span><span class="n">ned</span><span class="p">)</span>
</span><span><span class="go">-7890115541605828979</span>
</span></code></pre></div>
<p>Using these objects as keys can be confusing as well:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">ned</span> <span class="o">=</span> <span class="n">User</span><span class="p">(</span><span class="s1">'Ned'</span><span class="p">,</span> <span class="s1">'Stark'</span><span class="p">)</span>
</span><span><span class="gp">>>> </span><span class="n">d</span> <span class="o">=</span> <span class="p">{</span><span class="n">ned</span><span class="p">:</span> <span class="mi">123</span><span class="p">}</span>
</span><span><span class="gp">>>> </span><span class="n">d</span><span class="p">[</span><span class="n">ned</span><span class="p">]</span>
</span><span><span class="go">123</span>
</span><span><span class="gp">>>> </span><span class="n">ned</span><span class="o">.</span><span class="n">first_name</span> <span class="o">=</span> <span class="s1">'Robb'</span>
</span><span><span class="gp">>>> </span><span class="n">d</span><span class="p">[</span><span class="n">ned</span><span class="p">]</span>
</span><span><span class="gt">Traceback (most recent call last):</span>
</span><span> File <span class="nb">"<stdin>"</span>, line <span class="m">1</span>, in <span class="n"><module></span>
</span><span><span class="gr">KeyError</span>: <span class="n"><__main__.User object at 0x7fd60e7c5828></span>
</span><span><span class="gp">>>> </span><span class="n">ned</span><span class="o">.</span><span class="n">first_name</span> <span class="o">=</span> <span class="s1">'Ned'</span>
</span><span><span class="gp">>>> </span><span class="n">d</span><span class="p">[</span><span class="n">ned</span><span class="p">]</span>
</span><span><span class="go">123</span>
</span></code></pre></div>
<p>In essence, using mutable types as keys in a dictionary can lead to confusing results in a fairly
large codebase.</p>
<p>So, to avoid these potential problems, it’s best to use numbers, strings or tuples (containing
numbers or strings) as keys for dictionaries. If you <strong>have</strong> to use other types, keep the hashing
semantics in mind and document the reasons well.</p>
<h3 id="retrieving-keys">Retrieving Keys<a class="headerlink" href="#retrieving-keys" title="Permanent link">¶</a></h3>
<p>Dictionaries have a <a href="https://docs.python.org/3.6/library/stdtypes.html#dict.keys" rel="noopener noreferrer" target="_blank"><code>.keys</code></a> method that returns an object of type <a href="https://docs.python.org/3.6/library/stdtypes.html#dictionary-view-objects" rel="noopener noreferrer" target="_blank"><code>dict_keys</code></a>
which is an iterable (technically, a <em>view</em>) of the keys of the dictionary. Note that this method
used to return an ordinary <code>list</code> in Python 2.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">countries</span> <span class="o">=</span> <span class="n">country_currencies</span><span class="o">.</span><span class="n">keys</span><span class="p">()</span>
</span><span><span class="gp">>>> </span><span class="n">countries</span>
</span><span><span class="go">dict_keys(['India', 'Russia', 'USA', 'Japan'])</span>
</span><span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">collections</span>
</span><span><span class="gp">>>> </span><span class="nb">isinstance</span><span class="p">(</span><span class="n">countries</span><span class="p">,</span> <span class="n">collections</span><span class="o">.</span><span class="n">Iterable</span><span class="p">)</span>
</span><span><span class="go">True</span>
</span></code></pre></div>
<p>Note that the order of the keys is not retained/defined. Don’t rely on the order even if they seem
predictable. It might vary across Python implementations and versions even. Use an <code>OrderedDict</code>
when ordering is needed. More on this in a later section.</p>
<p class="note">It should be noted that starting in Python 3.6, order of keys <em>is</em> preserved. This is an unintended
side affect of using a more efficient <code>dict</code> implementation. As such, the Python documentation
explicitly states that this is an implementation detail and should not be relied upon. <a href="https://docs.python.org/3/whatsnew/3.6.html#whatsnew36-pep468" rel="noopener noreferrer" target="_blank">Read
more</a>.</p>
<p>So, what’s special about <code>dict_keys</code>, as opposed to a <code>list</code>? Look look!</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">countries</span>
</span><span><span class="go">dict_keys(['India', 'Russia', 'USA', 'Japan'])</span>
</span><span><span class="gp">>>> </span><span class="n">country_currencies</span><span class="p">[</span><span class="s1">'France'</span><span class="p">]</span> <span class="o">=</span> <span class="s1">'Euro'</span>
</span><span><span class="gp">>>> </span><span class="n">countries</span>
</span><span><span class="go">dict_keys(['India', 'Russia', 'USA', 'Japan', 'France'])</span>
</span></code></pre></div>
<p>See? The <code>dict_keys</code> object is a <em>view</em> of the keys of the original dictionary object. When the
dictionary’s keys change, so does the keys view. Of course, we can make a set of currently
available keys by passing it to <code>set</code> builtin. This set would be independent of the dictionary.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="nb">set</span><span class="p">(</span><span class="n">countries</span><span class="p">)</span>
</span><span><span class="go">{'Japan', 'USA', 'Russia', 'India', 'France'}</span>
</span></code></pre></div>
<p class="note">Most people would suggest and use <code>list</code> here, instead of <code>set</code>. I personally feel a <code>set</code> is
semantically more correct since a <code>list</code> indicates the contents have a specific ordering and does
not convey that the contents are hashable, immutable, and more importantly, <em>unique</em>. A <code>set</code> shares
these features of dictionary keys.</p>
<p>Additionally, the <code>dict_keys</code> objects are themselves <code>set</code>-like. They implement the <a href="https://docs.python.org/3.6/library/collections.abc.html#collections.abc.Set" rel="noopener noreferrer" target="_blank"><code>Set</code></a>
abstraction. So, we don’t need to convert them to a <code>set</code> in order to do set operations on them. For
example, here’s an intersection operation:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="nb">isinstance</span><span class="p">(</span><span class="n">countries</span><span class="p">,</span> <span class="n">collections</span><span class="o">.</span><span class="n">abc</span><span class="o">.</span><span class="n">Set</span><span class="p">)</span>
</span><span><span class="go">True</span>
</span><span><span class="gp">>>> </span><span class="n">countries</span> <span class="o">&</span> <span class="p">{</span><span class="s1">'India'</span><span class="p">,</span> <span class="s1">'China'</span><span class="p">}</span>
</span><span><span class="go">{'India'}</span>
</span></code></pre></div>
<h3 id="using-tuples-for-keys">Using Tuples for Keys<a class="headerlink" href="#using-tuples-for-keys" title="Permanent link">¶</a></h3>
<p>Here’s a quick example of using tuples as keys in a dictionary:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">data</span> <span class="o">=</span> <span class="p">{</span>
</span><span><span class="gp">... </span> <span class="p">(</span><span class="s1">'a'</span><span class="p">,</span> <span class="mi">1</span><span class="p">):</span> <span class="s1">'a1'</span><span class="p">,</span>
</span><span><span class="gp">... </span> <span class="p">(</span><span class="s1">'a'</span><span class="p">,</span> <span class="mi">2</span><span class="p">):</span> <span class="s1">'a2'</span><span class="p">,</span>
</span><span><span class="gp">... </span> <span class="p">(</span><span class="s1">'b'</span><span class="p">,</span> <span class="mi">1</span><span class="p">):</span> <span class="s1">'b1'</span><span class="p">,</span>
</span><span><span class="gp">... </span> <span class="p">(</span><span class="s1">'b'</span><span class="p">,</span> <span class="mi">2</span><span class="p">):</span> <span class="s1">'b2'</span><span class="p">,</span>
</span><span><span class="gp">... </span><span class="p">}</span>
</span><span><span class="gp">>>> </span><span class="n">data</span><span class="p">[</span><span class="s1">'a'</span><span class="p">,</span> <span class="mi">2</span><span class="p">]</span>
</span><span><span class="go">'a2'</span>
</span></code></pre></div>
<p>Note that only tuples that contain hashable types (or further such tuples) can be used as keys.
Lists or dictionaries, on the other hand, cannot be used since they are not hashable.</p>
<h3 id="retrieving-values">Retrieving Values<a class="headerlink" href="#retrieving-values" title="Permanent link">¶</a></h3>
<p>Values are what the keys index. Naturally, values don’t have to be unique, unlike keys. There’s no
restrictions on what types can be used as values in a dictionary.</p>
<p>We can get a sequence of values in a <code>dict</code> with the <a href="https://docs.python.org/3.6/library/stdtypes.html#dict.values" rel="noopener noreferrer" target="_blank"><code>.values</code></a> method. This returns a
<a href="https://docs.python.org/3.6/library/stdtypes.html#dictionary-view-objects" rel="noopener noreferrer" target="_blank"><code>dict_values</code></a> object.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">currencies</span> <span class="o">=</span> <span class="n">country_currencies</span><span class="o">.</span><span class="n">values</span><span class="p">()</span>
</span><span><span class="gp">>>> </span><span class="n">currencies</span>
</span><span><span class="go">dict_values(['Rupee', 'Ruble', 'Dollar', 'Euro', 'Yen'])</span>
</span><span><span class="gp">>>> </span><span class="nb">type</span><span class="p">(</span><span class="n">currencies</span><span class="p">)</span>
</span><span><span class="go"><class 'dict_values'></span>
</span><span><span class="gp">>>> </span><span class="nb">isinstance</span><span class="p">(</span><span class="n">currencies</span><span class="p">,</span> <span class="n">collections</span><span class="o">.</span><span class="n">abc</span><span class="o">.</span><span class="n">Set</span><span class="p">)</span>
</span><span><span class="go">False</span>
</span></code></pre></div>
<p>This is <em>live</em> as well!</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="k">del</span> <span class="n">country_currencies</span><span class="p">[</span><span class="s1">'France'</span><span class="p">]</span>
</span><span><span class="gp">>>> </span><span class="n">currencies</span>
</span><span><span class="go">dict_values(['Rupee', 'Ruble', 'Dollar', 'Yen'])</span>
</span></code></pre></div>
<p>This can be passed to <code>list</code> to get a list of values. Using <code>set</code> here is probably not a good idea
since unlike the keys, values don’t have to be unique or hashable.</p>
<h3 id="items-collection">Items Collection<a class="headerlink" href="#items-collection" title="Permanent link">¶</a></h3>
<p>Dictionaries also provide a <a href="https://docs.python.org/3.6/library/stdtypes.html#dict.items" rel="noopener noreferrer" target="_blank"><code>.items</code></a> method that returns all the key–value pairs as a
sequence of 2-tuples.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">pairs</span> <span class="o">=</span> <span class="n">country_currencies</span><span class="o">.</span><span class="n">items</span><span class="p">()</span>
</span><span><span class="gp">>>> </span><span class="n">pairs</span>
</span><span><span class="go">dict_items([('India', 'Rupee'), ('Russia', 'Ruble'), ('USA', 'Dollar'), ('Japan', 'Yen')])</span>
</span></code></pre></div>
<p>Again, just like with <code>.keys</code> or <code>.values</code>, the sequence is <em>live</em> and the order of items is not
defined.</p>
<p>The <code>.items</code> method is probably mostly used with the <code>for</code> statement to loop over the key–value
pairs.</p>
<div class="hl"><pre class=content><code><span><span class="k">for</span> <span class="n">country</span><span class="p">,</span> <span class="n">currency</span> <span class="ow">in</span> <span class="n">country_currencies</span><span class="p">:</span>
</span><span> <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">country</span><span class="si">}</span><span class="s2">'s currency is </span><span class="si">{</span><span class="n">currency</span><span class="si">}</span><span class="s2">."</span><span class="p">)</span>
</span></code></pre></div>
<p class="note">The above code uses <a href="https://docs.python.org/3.6/reference/lexical_analysis.html#f-strings" rel="noopener noreferrer" target="_blank">f-strings</a> introduced in Python 3.6. In older versions of Python, the
<code>.format</code> method or the modulo (<code>%</code>) operator should be used.</p>
<p>The <code>dict_items</code> object also implements the <code>Set</code> abstraction.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="nb">isinstance</span><span class="p">(</span><span class="n">pairs</span><span class="p">,</span> <span class="n">collections</span><span class="o">.</span><span class="n">abc</span><span class="o">.</span><span class="n">Set</span><span class="p">)</span>
</span><span><span class="go">True</span>
</span></code></pre></div>
<p>However, the abstraction’s methods only work if the dictionary’s values are hashable, not just the
keys. So, for the dictionary we are working with, the <code>pairs</code> object can be used as a set.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">pairs</span> <span class="o">&</span> <span class="p">{(</span><span class="s1">'India'</span><span class="p">,</span> <span class="s1">'Rupee'</span><span class="p">),</span> <span class="p">(</span><span class="s1">'UK'</span><span class="p">,</span> <span class="s1">'Pound'</span><span class="p">)}</span>
</span><span><span class="go">{('India', 'Rupee')}</span>
</span></code></pre></div>
<p>But if we try this on a dictionary whose values are not <code>hash</code>able, say, lists, then it fails.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">number_types</span> <span class="o">=</span> <span class="p">{</span>
</span><span><span class="gp">... </span> <span class="s1">'even'</span><span class="p">:</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">8</span><span class="p">],</span>
</span><span><span class="gp">... </span> <span class="s1">'odd'</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">9</span><span class="p">],</span>
</span><span><span class="gp">... </span><span class="p">}</span>
</span><span><span class="gp">>>> </span><span class="n">pairs</span> <span class="o">=</span> <span class="n">number_types</span><span class="o">.</span><span class="n">items</span><span class="p">()</span>
</span><span><span class="gp">>>> </span><span class="n">pairs</span>
</span><span><span class="go">dict_items([('even', [2, 4, 6, 8]), ('odd', [1, 3, 5, 7, 9])])</span>
</span><span><span class="gp">>>> </span><span class="nb">isinstance</span><span class="p">(</span><span class="n">pairs</span><span class="p">,</span> <span class="n">collections</span><span class="o">.</span><span class="n">abc</span><span class="o">.</span><span class="n">Set</span><span class="p">)</span>
</span><span><span class="go">True</span>
</span></code></pre></div>
<p>Let’s try intersecting this with an empty set.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">pairs</span> <span class="o">&</span> <span class="nb">set</span><span class="p">()</span>
</span><span><span class="gt">Traceback (most recent call last):</span>
</span><span> File <span class="nb">"<stdin>"</span>, line <span class="m">1</span>, in <span class="n"><module></span>
</span><span><span class="gr">TypeError</span>: <span class="n">unhashable type: 'list'</span>
</span></code></pre></div>
<p>As the error says, <code>list</code> is not hashable. So, although the <code>isinstance</code> tells us that this is a
<code>Set</code>, whether it can actually be used as such, depends on it’s contents. This is <em>not incorrect</em>,
actually, I feel it’s just a consequence of Python’s dynamic nature.</p>
<h2 id="typing">Typing<a class="headerlink" href="#typing" title="Permanent link">¶</a></h2>
<p>Dictionaries in Python are what I call a <em>homogeneous</em> data structure. What that means is that they
are best used by having all the keys be of the same type and similarly for values. This is enforced
in comparable data structures in statically typed languages like Java’s <code>Map</code> or Haskell’s
<code>HashMap</code>. But since Python is a dynamic language, such restrictions are not placed. We can have
keys / values of several different types within the same dictionary.</p>
<div class="hl"><pre class=content><code><span><span class="n">data</span> <span class="o">=</span> <span class="p">{</span>
</span><span> <span class="s1">'a'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
</span><span> <span class="mi">42</span><span class="p">:</span> <span class="s1">'yay!'</span><span class="p">,</span>
</span><span> <span class="p">(</span><span class="s1">'a'</span><span class="p">,</span> <span class="s1">'b'</span><span class="p">,</span> <span class="mi">2</span><span class="p">):</span> <span class="kc">True</span><span class="p">,</span>
</span><span><span class="p">}</span>
</span></code></pre></div>
<p>This is still a valid dictionary, although an extremely sad and ugly one (totally my opinion :D).</p>
<p>If using homogeneous dictionaries, the type annotations syntax can be used to declare the type
signatures. We use <a href="https://docs.python.org/3/library/typing.html#typing.Dict" rel="noopener noreferrer" target="_blank"><code>typing.Dict</code></a> for this purpose as illustrated below.</p>
<div class="hl"><pre class=content><code><span><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Dict</span><span class="p">,</span> <span class="n">Tuple</span>
</span><span>
</span><span><span class="n">number_map</span><span class="p">:</span> <span class="n">Dict</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="mi">1</span><span class="p">:</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">2</span><span class="p">:</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">3</span><span class="p">:</span> <span class="mi">30</span><span class="p">}</span>
</span><span><span class="n">data_map</span><span class="p">:</span> <span class="n">Dict</span><span class="p">[</span><span class="n">Tuple</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span> <span class="nb">str</span><span class="p">]</span> <span class="o">=</span> <span class="p">{(</span><span class="s1">'a'</span><span class="p">,</span> <span class="mi">1</span><span class="p">):</span> <span class="s1">'a1'</span><span class="p">,</span> <span class="p">(</span><span class="s1">'a'</span><span class="p">,</span> <span class="mi">2</span><span class="p">):</span> <span class="s1">'a2'</span><span class="p">}</span>
</span></code></pre></div>
<p>This is new in Python 3.6. Before 3.6, annotations are only supported for function arguments. <a href="https://docs.python.org/3/whatsnew/3.6.html#pep-526-syntax-for-variable-annotations" rel="noopener noreferrer" target="_blank">Read
more</a>.</p>
<p class="note">Additionally, the <code>typing</code> module itself is new in Python 3.5. <a href="https://docs.python.org/3/whatsnew/3.5.html#whatsnew-pep-484" rel="noopener noreferrer" target="_blank">Read
more</a>.</p>
<p>The general structure is <code>Dict[<key-type>, <value-type>]</code>. So, <code>Dict[str, int]</code> denotes a dictionary
that maps string keys to integer values.</p>
<p>Note that these type annotations are not checked at runtime. They’re mere help to IDEs, static
checkers and human readers. Python’s dynamic nature is not affected by these annotations.</p>
<p>However, if such type annotations are declared, you could use a static analyzer like <a href="http://www.mypy-lang.org/" rel="noopener noreferrer" target="_blank">mypy</a> to
perform type checks. I won’t be discussing that here.</p>
<h2 id="creating-dictionaries">Creating Dictionaries<a class="headerlink" href="#creating-dictionaries" title="Permanent link">¶</a></h2>
<p>There are a few other ways to create dictionaries besides the <code>{}</code> syntax. Here’s a few of them.</p>
<h3 id="calling-dict">Calling <code>dict</code><a class="headerlink" href="#calling-dict" title="Permanent link">¶</a></h3>
<p>The <code>dict</code> callable can be used to create dictionaries from a list of tuples or bypassing the keys
and values as keyword arguments.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="nb">dict</span><span class="p">([(</span><span class="s1">'Chromium'</span><span class="p">,</span> <span class="mi">24</span><span class="p">),</span> <span class="p">(</span><span class="s1">'Phosphorus'</span><span class="p">,</span> <span class="mi">15</span><span class="p">),</span> <span class="p">(</span><span class="s1">'Silver'</span><span class="p">,</span> <span class="mi">47</span><span class="p">)])</span>
</span><span><span class="go">{'Chromium': 24, 'Phosphorus': 15, 'Silver': 47}</span>
</span></code></pre></div>
<p>This is obviously more convenient than the dictionary syntax <em>only</em> if we already have such a list.
If we have the keys and corresponding values in different lists, we can <code>zip</code> them up and pass the
result to <code>dict</code>.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="nb">dict</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span>
</span><span><span class="gp">... </span> <span class="p">[</span><span class="s1">'Sulfer'</span><span class="p">,</span> <span class="s1">'Calcium'</span><span class="p">,</span> <span class="s1">'Gold'</span><span class="p">],</span> <span class="c1"># Keys</span>
</span><span><span class="gp">... </span> <span class="p">[</span><span class="mi">16</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">79</span><span class="p">],</span> <span class="c1"># Values</span>
</span><span><span class="gp">... </span><span class="p">))</span>
</span><span><span class="go">{'Sulfer': 16, 'Calcium': 20, 'Gold': 79}</span>
</span></code></pre></div>
<p>Of course, we can pass keyword arguments directly to <code>dict</code>, in addition to the above even.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="nb">dict</span><span class="p">(</span><span class="nb">dict</span><span class="p">([(</span><span class="s1">'Chromium'</span><span class="p">,</span> <span class="mi">24</span><span class="p">),</span> <span class="p">(</span><span class="s1">'Phosphorus'</span><span class="p">,</span> <span class="mi">15</span><span class="p">)]),</span> <span class="n">Sodium</span><span class="o">=</span><span class="mi">11</span><span class="p">,</span> <span class="n">Nitrogen</span><span class="o">=</span><span class="mi">7</span><span class="p">)</span>
</span><span><span class="go">{'Chromium': 24, 'Phosphorus': 15, 'Sodium': 11, 'Nitrogen': 7}</span>
</span><span><span class="gp">>>> </span><span class="nb">dict</span><span class="p">(</span><span class="n">Sodium</span><span class="o">=</span><span class="mi">11</span><span class="p">,</span> <span class="n">Nitrogen</span><span class="o">=</span><span class="mi">7</span><span class="p">)</span>
</span><span><span class="go">{'Sodium': 11, 'Nitrogen': 7}</span>
</span></code></pre></div>
<p>The second form is better written using the Python syntax. That is more natural to a potential
future reader, and, <em>slightly</em> faster<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup> as well.</p>
<h3 id="comprehensions">Comprehensions<a class="headerlink" href="#comprehensions" title="Permanent link">¶</a></h3>
<p>Python 3 (and 2.7) added support for dict comprehensions which are very similar to list
comprehensions, but with a small variation in syntax.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="nb">dict</span><span class="p">((</span><span class="n">i</span><span class="p">,</span> <span class="n">i</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">))</span> <span class="c1"># Using the `dict` builtin.</span>
</span><span><span class="go">{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}</span>
</span><span><span class="gp">>>> </span><span class="p">{</span><span class="n">i</span><span class="p">:</span> <span class="n">i</span><span class="o">**</span><span class="mi">2</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">)}</span> <span class="c1"># Using a dict comprehension.</span>
</span><span><span class="go">{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}</span>
</span></code></pre></div>
<p>The above two examples create the same dictionary. However, as pointed out in <a href="https://www.python.org/dev/peps/pep-0274/" rel="noopener noreferrer" target="_blank">PEP 274</a>, the dict
comprehension is more succinct and makes the intent clearer.</p>
<h2 id="public-appearance">Public Appearance<a class="headerlink" href="#public-appearance" title="Permanent link">¶</a></h2>
<p>Unsurprisingly, dictionaries pop up in a lot of places in Python. Here’s a few ones.</p>
<h3 id="keyword-arguments">Keyword Arguments<a class="headerlink" href="#keyword-arguments" title="Permanent link">¶</a></h3>
<p>When defining a function that takes arbitrary keyword arguments, they are passed to the function as
a dictionary.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="k">def</span> <span class="nf">construct</span><span class="p">(</span><span class="o">**</span><span class="n">counts</span><span class="p">):</span>
</span><span><span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="n">counts</span><span class="p">)</span>
</span><span><span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">counts</span><span class="p">),</span> <span class="nb">type</span><span class="p">(</span><span class="n">counts</span><span class="p">))</span>
</span><span><span class="gp">...</span>
</span><span><span class="gp">>>> </span><span class="n">construct</span><span class="p">(</span><span class="n">a</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">b</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
</span><span><span class="go">{'a': 1, 'b': 2, 'c': 3}</span>
</span><span><span class="go">3 <class 'dict'></span>
</span></code></pre></div>
<p>Of course, we can pass a dictionary’s data as keyword arguments to a function using similar syntax.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">kw_args</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'a'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="s1">'b'</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span> <span class="s1">'c'</span><span class="p">:</span> <span class="mi">3</span><span class="p">}</span>
</span><span><span class="gp">>>> </span><span class="n">construct</span><span class="p">(</span><span class="o">**</span><span class="n">kw_args</span><span class="p">)</span>
</span><span><span class="go">{'a': 1, 'b': 2, 'c': 3}</span>
</span><span><span class="go">3 <class 'dict'></span>
</span></code></pre></div>
<h3 id="namespaces">Namespaces<a class="headerlink" href="#namespaces" title="Permanent link">¶</a></h3>
<p>The <code>globals</code> builtin function gives a dictionary of all names and their values in the current
global namespace. We can modify this dictionary to define new names or delete existing ones,
although that’s probably a bad idea.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="nb">len</span><span class="p">(</span><span class="nb">globals</span><span class="p">())</span>
</span><span><span class="go">25</span>
</span><span><span class="gp">>>> </span><span class="nb">globals</span><span class="p">()[</span><span class="s1">'x'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">123</span>
</span><span><span class="gp">>>> </span><span class="n">x</span>
</span><span><span class="go">123</span>
</span><span><span class="gp">>>> </span><span class="k">del</span> <span class="nb">globals</span><span class="p">()[</span><span class="s1">'x'</span><span class="p">]</span>
</span><span><span class="gp">>>> </span><span class="n">x</span>
</span><span><span class="gt">Traceback (most recent call last):</span>
</span><span> File <span class="nb">"<stdin>"</span>, line <span class="m">1</span>, in <span class="n"><module></span>
</span><span><span class="gr">NameError</span>: <span class="n">name 'x' is not defined</span>
</span></code></pre></div>
<p>The <code>locals</code> builtin returns a dictionary of names and values from the local scope, for e.g., the
private local scope inside of a function or method.</p>
<p>The <code>vars</code> builtin takes an object as an argument and returns the names available as properties on
this objects. Specifically, it returns the <code>__dict__</code> property’s value of the given object. When
called without any arguments, then it returns names and values from the local scope. In other words,
<code>vars() is locals()</code> return <code>True</code>.</p>
<h2 id="serialization">Serialization<a class="headerlink" href="#serialization" title="Permanent link">¶</a></h2>
<p>Dictionaries, being key–value data structures, extend naturally to be stored into key–value
databases and other NoSQL data stores. However, here we’ll look at forms of serializing them into
text and binary forms for transmission or for saving to disk.</p>
<h3 id="json">JSON<a class="headerlink" href="#json" title="Permanent link">¶</a></h3>
<p>Nowadays, the thought of serializing a python dictionary is usually followed by using the
<a href="https://docs.python.org/3/library/json.html" rel="noopener noreferrer" target="_blank"><code>json</code></a> module to <a href="https://docs.python.org/3/library/json.html#json.dump" rel="noopener noreferrer" target="_blank"><code>dump</code></a> and <a href="https://docs.python.org/3/library/json.html#json.load" rel="noopener noreferrer" target="_blank"><code>load</code></a> using the <a href="http://json.org/" rel="noopener noreferrer" target="_blank">JSON
format</a>. No surprise since it’s extremely convenient and there’s quality parsers and writers
for almost every programming language today. The syntax as well, although not too convenient to
write by hand, is still very simple, lightweight and easy to read. It helps that the syntax is quite
close to Python’s own syntax for dictionaries.</p>
<p>Here’s a quick example:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">json</span>
</span><span><span class="gp">>>> </span><span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">country_currencies</span><span class="p">)</span>
</span><span><span class="go">'{"India": "Rupee", "Russia": "Ruble", "USA": "Dollar", "Japan": "Yen"}'</span>
</span><span><span class="gp">>>> </span><span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">country_currencies</span><span class="p">))</span> <span class="o">==</span> <span class="n">country_currencies</span>
</span><span><span class="go">True</span>
</span></code></pre></div>
<p>In short, these four functions from <code>json</code> module are enough to know the basic usage.</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>Method</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>.dump(obj, fp)</code></td>
<td>Turn <code>obj</code> into JSON and write it to the <code>fp</code> file-like object.</td>
</tr>
<tr>
<td><code>.dumps(obj)</code></td>
<td>Turn <code>obj</code> into JSON and return the resulting string.</td>
</tr>
<tr>
<td><code>.load(fp)</code></td>
<td>Read valid JSON from <code>fp</code> file-like object and return the resulting object.</td>
</tr>
<tr>
<td><code>.loads(data)</code></td>
<td>Parse <code>data</code> as a valid JSON string and return the resulting object.</td>
</tr>
</tbody>
</table>
</div>
<p>As convenient as this is, it is important to know the changes to data types that will result because
of this. JSON only supports numbers, strings and booleans as primary data types and arrays & maps as
analogues to <code>list</code>s and <code>dict</code>s. As a result of this, if there are tuples somewhere in the
dictionary, then they will be turned into lists when the <code>dict</code> is serialized and deserialized with
JSON. A similar situation occurs for dates and any other data type not directly supported by the
JSON spec.</p>
<h3 id="pickling">Pickling<a class="headerlink" href="#pickling" title="Permanent link">¶</a></h3>
<p>Unlike the above, pickling (using the <a href="https://docs.python.org/3/library/pickle.html" rel="noopener noreferrer" target="_blank"><code>pickle</code></a> module) serializes objects into binary
data and can handle a much wider range of data types. For this reason, pickled data can only be
loaded by Python, not other languages (well, not yet at least).</p>
<p>The <code>pickle</code> module has similar <code>dump</code>, <code>dumps</code>, <code>load</code> and <code>loads</code> methods just like for the above
discussed <code>json</code> module.</p>
<h2 id="the-item-syntax">The Item Syntax<a class="headerlink" href="#the-item-syntax" title="Permanent link">¶</a></h2>
<p>The syntax used to get an item from a dictionary, given it’s index, is <code>data[key]</code>. This is mostly
equivalent to calling the <code>__getitem__</code> method, like the following:</p>
<div class="hl"><pre class=content><code><span><span class="n">data</span><span class="o">.</span><span class="fm">__getitem__</span><span class="p">(</span><span class="n">key</span><span class="p">)</span>
</span></code></pre></div>
<p>But obviously, we’d prefer the square bracket syntax. But understanding that underneath the syntax,
it’s just a method call, lets us implement the <code>__getitem__</code> method in our own classes and get the
item syntax on our objects.</p>
<p>Here’s a simple example:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="k">class</span> <span class="nc">Store</span><span class="p">:</span>
</span><span><span class="gp">... </span> <span class="k">def</span> <span class="fm">__getitem__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
</span><span><span class="gp">... </span> <span class="k">return</span> <span class="n">name</span><span class="o">.</span><span class="n">upper</span><span class="p">()</span>
</span><span><span class="gp">...</span>
</span><span><span class="gp">>>> </span><span class="n">store</span> <span class="o">=</span> <span class="n">Store</span><span class="p">()</span>
</span><span><span class="gp">>>> </span><span class="n">store</span><span class="p">[</span><span class="s1">'Hello there!'</span><span class="p">]</span>
</span><span><span class="go">'HELLO THERE!'</span>
</span></code></pre></div>
<p>Similar to this is the <code>__setitem__</code> which is used to set the value using the item syntax.</p>
<div class="hl"><pre class=content><code><span><span class="c1"># The following two are equivalent.</span>
</span><span><span class="n">data</span><span class="p">[</span><span class="n">key</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span>
</span><span><span class="n">data</span><span class="o">.</span><span class="fm">__setitem__</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
</span></code></pre></div>
<p>Note that this should be used responsibly. This feature gets into borderline operator overloading
category. In almost all cases (including the above example), using a normal named method on your
classes should be a better option than overriding the item syntax. Since a normal method would have
a name which makes the intent clearer.</p>
<h2 id="flavors">Flavors<a class="headerlink" href="#flavors" title="Permanent link">¶</a></h2>
<p>Python’s standard library comes with a few flavors of dictionaries that provide some nice additional
functionality. These data structures are all available in the <a href="https://docs.python.org/3/library/collections.html" rel="noopener noreferrer" target="_blank"><code>collections</code></a> module.</p>
<p>The following are subclasses of <code>dict</code> and have all the features of Python’s dictionaries.</p>
<h3 id="the-ordereddict">The <code>OrderedDict</code><a class="headerlink" href="#the-ordereddict" title="Permanent link">¶</a></h3>
<p>The <a href="https://docs.python.org/3/library/collections.html#collections.OrderedDict" rel="noopener noreferrer" target="_blank"><code>collections.OrderedDict</code></a> is a dictionary that remembers the order in which keys
are <em>inserted</em>. The order remembered is the <em>insertion</em> order. So, if we add a new key to the dict,
it will be at the end of the key sequence. But if we change the value of an existing key, it’s
position in the ordering is unchanged.</p>
<p>Create a new <code>OrderedDict</code>:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">OrderedDict</span>
</span><span><span class="gp">>>> </span><span class="n">planet_satellites</span> <span class="o">=</span> <span class="n">OrderedDict</span><span class="p">(</span>
</span><span><span class="gp">... </span> <span class="n">Mercury</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span>
</span><span><span class="gp">... </span> <span class="n">Venus</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span>
</span><span><span class="gp">... </span> <span class="n">Earth</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
</span><span><span class="gp">... </span> <span class="n">Mars</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
</span><span><span class="gp">... </span> <span class="n">Jupiter</span><span class="o">=</span><span class="mi">69</span><span class="p">,</span>
</span><span><span class="gp">... </span> <span class="n">Saturn</span><span class="o">=</span><span class="mi">62</span><span class="p">,</span>
</span><span><span class="gp">... </span> <span class="n">Uranus</span><span class="o">=</span><span class="mi">27</span><span class="p">,</span>
</span><span><span class="gp">... </span> <span class="n">Neptune</span><span class="o">=</span><span class="mi">14</span><span class="p">,</span>
</span><span><span class="gp">... </span><span class="p">)</span>
</span><span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">pprint</span> <span class="kn">import</span> <span class="n">pprint</span>
</span><span><span class="gp">>>> </span><span class="n">pprint</span><span class="p">(</span><span class="n">planet_satellites</span><span class="p">)</span>
</span><span><span class="go">OrderedDict([('Mercury', 0),</span>
</span><span><span class="go"> ('Venus', 0),</span>
</span><span><span class="go"> ('Earth', 1),</span>
</span><span><span class="go"> ('Mars', 2),</span>
</span><span><span class="go"> ('Jupiter', 69),</span>
</span><span><span class="go"> ('Saturn', 62),</span>
</span><span><span class="go"> ('Uranus', 27),</span>
</span><span><span class="go"> ('Neptune', 14)])</span>
</span></code></pre></div>
<p>Note that we use the <a href="https://docs.python.org/3/library/pprint.html#pprint.pprint" rel="noopener noreferrer" target="_blank"><code>pprint</code></a> function to show the <code>OrderedDict</code> objects in a convenient
way.</p>
<p>They are just dictionaries under the hood.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="nb">isinstance</span><span class="p">(</span><span class="n">planet_satellites</span><span class="p">,</span> <span class="nb">dict</span><span class="p">)</span>
</span><span><span class="go">True</span>
</span><span><span class="gp">>>> </span><span class="n">planet_satellites</span><span class="p">[</span><span class="s1">'Mars'</span><span class="p">]</span>
</span><span><span class="go">2</span>
</span></code></pre></div>
<p>These objects support being reversed as well:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">rev_planets</span> <span class="o">=</span> <span class="n">OrderedDict</span><span class="p">(</span><span class="nb">reversed</span><span class="p">(</span><span class="n">planet_satellites</span><span class="o">.</span><span class="n">items</span><span class="p">()))</span>
</span><span><span class="gp">>>> </span><span class="n">pprint</span><span class="p">(</span><span class="n">rev_planets</span><span class="p">)</span>
</span><span><span class="go">OrderedDict([('Neptune', 14),</span>
</span><span><span class="go"> ('Uranus', 27),</span>
</span><span><span class="go"> ('Saturn', 62),</span>
</span><span><span class="go"> ('Jupiter', 69),</span>
</span><span><span class="go"> ('Mars', 2),</span>
</span><span><span class="go"> ('Earth', 1),</span>
</span><span><span class="go"> ('Venus', 0),</span>
</span><span><span class="go"> ('Mercury', 0)])</span>
</span></code></pre></div>
<p>The results of <code>.keys</code> and <code>.values</code> methods also retain the ordering. Refer to the official
documentation linked above for full details.</p>
<h3 id="the-defaultdict">The <code>defaultdict</code><a class="headerlink" href="#the-defaultdict" title="Permanent link">¶</a></h3>
<p class="note">The name <code>defaultdict</code> is unfortunate as it doesn’t adhere to any naming conventions. I’d love to
see it renamed to <code>default_dict</code> or even <code>DefaultDict</code>, but it’s probably easier to just live with
it.</p>
<p>A <code>defaultdict</code> can understand how to initialize new keys. Consider the following code. Here, we
have a piece of text and we want a dictionary mapping each letter in the text to it’s count of
occurrences.</p>
<div class="hl"><pre class=content><code><span><span class="n">text</span> <span class="o">=</span> <span class="s1">'lorem ipsum dolor sit amet'</span>
</span><span><span class="n">counts</span> <span class="o">=</span> <span class="p">{}</span>
</span><span><span class="k">for</span> <span class="n">letter</span> <span class="ow">in</span> <span class="n">text</span><span class="p">:</span>
</span><span> <span class="k">if</span> <span class="n">letter</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">counts</span><span class="p">:</span>
</span><span> <span class="n">counts</span><span class="p">[</span><span class="n">letter</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span>
</span><span> <span class="n">counts</span><span class="p">[</span><span class="n">letter</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">1</span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="n">counts</span><span class="p">)</span>
</span></code></pre></div>
<p class="note">Of course, there’s better ways to do this, but for the sake of example, let’s bear with this
implementation.</p>
<p>Notice how we check if the letter is not already present in the dict and if so, we initialize it to
zero. A <code>defaultdict</code> can learn this method of initialization. It takes a function as its first
argument which returns the value of a new key when accessed. So, we can replace the above code to
use <code>defaultdict</code> like:</p>
<div class="hl"><pre class=content><code><span><span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">defaultdict</span>
</span><span><span class="n">text</span> <span class="o">=</span> <span class="s1">'lorem ipsum dolor sit amet'</span>
</span><span><span class="n">counts</span> <span class="o">=</span> <span class="n">defaultdict</span><span class="p">(</span><span class="nb">int</span><span class="p">)</span>
</span><span><span class="k">for</span> <span class="n">letter</span> <span class="ow">in</span> <span class="n">text</span><span class="p">:</span>
</span><span> <span class="n">counts</span><span class="p">[</span><span class="n">letter</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">1</span>
</span><span><span class="nb">print</span><span class="p">(</span><span class="n">counts</span><span class="p">)</span>
</span></code></pre></div>
<p>When we try to get the value of a letter from <code>counts</code>, and that letter doesn’t already exist in
<code>counts</code>, <code>defaultdict</code> will call <code>int</code>, with no arguments, and puts the return value into
<code>counts[letter]</code>. Precisely what we were doing in our previous example. So, what does <code>int</code> return
when called with no arguments? You guessed it, zero!</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="nb">int</span><span class="p">()</span>
</span><span><span class="go">0</span>
</span><span><span class="gp">>>> </span><span class="nb">float</span><span class="p">()</span>
</span><span><span class="go">0.0</span>
</span><span><span class="gp">>>> </span><span class="nb">str</span><span class="p">()</span>
</span><span><span class="go">''</span>
</span><span><span class="gp">>>> </span><span class="nb">bool</span><span class="p">()</span>
</span><span><span class="go">False</span>
</span><span><span class="gp">>>> </span><span class="nb">list</span><span class="p">()</span>
</span><span><span class="go">[]</span>
</span><span><span class="gp">>>> </span><span class="nb">dict</span><span class="p">()</span>
</span><span><span class="go">{}</span>
</span><span><span class="gp">>>> </span><span class="nb">set</span><span class="p">()</span>
</span><span><span class="go">set()</span>
</span></code></pre></div>
<p>As illustrated above, calling the data type builtins with no arguments return the falsy value of
that data type. We can use this fact and pass these builtins to <code>defaultdict</code> constructor depending
on the need. If we wanted a different initial value, say <code>42</code>, we could use a lambda function like
<code>lambda: 42</code> instead.</p>
<h3 id="the-chainmap">The <code>ChainMap</code><a class="headerlink" href="#the-chainmap" title="Permanent link">¶</a></h3>
<p>The <code>ChainMap</code> is an abstraction over a chain of dictionaries in order of precedence. Essentially,
it holds a list of dictionaries and when a key is indexed, each of these dictionaries are searched
for this key and the value of the first match is returned.</p>
<p>This is better illustrated with an example. Let’s create a <code>ChainMap</code> with dummy data:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">ChainMap</span>
</span><span><span class="gp">>>> </span><span class="n">data</span> <span class="o">=</span> <span class="n">ChainMap</span><span class="p">({</span><span class="s1">'a'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="s1">'b'</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span> <span class="s1">'c'</span><span class="p">:</span> <span class="mi">3</span><span class="p">},</span> <span class="p">{</span><span class="s1">'c'</span><span class="p">:</span> <span class="mi">30</span><span class="p">,</span> <span class="s1">'d'</span><span class="p">:</span> <span class="mi">40</span><span class="p">,</span> <span class="s1">'e'</span><span class="p">:</span> <span class="mi">50</span><span class="p">})</span>
</span><span><span class="gp">>>> </span><span class="n">data</span>
</span><span><span class="go">ChainMap({'a': 1, 'b': 2, 'c': 3}, {'c': 30, 'd': 40, 'e': 50})</span>
</span><span><span class="gp">>>> </span><span class="n">data</span><span class="o">.</span><span class="n">maps</span> <span class="c1"># A list of maps in the chain.</span>
</span><span><span class="go">[{'a': 1, 'b': 2, 'c': 3}, {'c': 30, 'd': 40, 'e': 50}]</span>
</span></code></pre></div>
<p>Let’s try indexing:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">data</span><span class="p">[</span><span class="s1">'a'</span><span class="p">]</span>
</span><span><span class="go">1</span>
</span><span><span class="gp">>>> </span><span class="n">data</span><span class="p">[</span><span class="s1">'e'</span><span class="p">]</span>
</span><span><span class="go">50</span>
</span><span><span class="gp">>>> </span><span class="n">data</span><span class="p">[</span><span class="s1">'c'</span><span class="p">]</span>
</span><span><span class="go">3</span>
</span></code></pre></div>
<p>Here, the <code>'a'</code> is indexed from the first dictionary, <code>'e'</code> is indexed from the second dictionary
and <code>'c'</code> is indexed from the first dictionary.</p>
<p>As mentioned in the documentation, writes, updates and deletes, however, operate on the first
dictionary alone.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">data</span><span class="p">[</span><span class="s1">'a'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">91</span>
</span><span><span class="gp">>>> </span><span class="n">data</span>
</span><span><span class="go">ChainMap({'a': 91, 'b': 2, 'c': 3}, {'c': 30, 'd': 40, 'e': 50})</span>
</span><span><span class="gp">>>> </span><span class="n">data</span><span class="p">[</span><span class="s1">'e'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">951</span>
</span><span><span class="gp">>>> </span><span class="n">data</span>
</span><span><span class="go">ChainMap({'a': 91, 'b': 2, 'c': 3, 'e': 951}, {'c': 30, 'd': 40, 'e': 50})</span>
</span><span><span class="gp">>>> </span><span class="n">data</span><span class="p">[</span><span class="s1">'c'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">93</span>
</span><span><span class="gp">>>> </span><span class="n">data</span>
</span><span><span class="go">ChainMap({'a': 91, 'b': 2, 'c': 93, 'e': 951}, {'c': 30, 'd': 40, 'e': 50})</span>
</span></code></pre></div>
<p>Of course, if we explicitly want to modify the last dictionary, it can be indexed directly:</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">data</span><span class="o">.</span><span class="n">maps</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">][</span><span class="s1">'c'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">999</span>
</span><span><span class="gp">>>> </span><span class="n">data</span>
</span><span><span class="go">ChainMap({'a': 91, 'b': 2, 'c': 93, 'e': 951}, {'c': 999, 'd': 40, 'e': 50})</span>
</span></code></pre></div>
<p>The <code>ChainMap</code> is useful to hold tiers of configuration parameters for an application, in a form
similar to the following:</p>
<div class="hl"><pre class=content><code><span><span class="n">ChainMap</span><span class="p">(</span><span class="n">user_settings</span><span class="p">,</span> <span class="n">default_settings</span><span class="p">)</span>
</span></code></pre></div>
<p>We can have multiple tiers depending the situation. The user can modify the dictionary as they fit
and all writes and updates will be made only on the first dictionary, <code>user_settings</code>. Whereas, when
one tries to get the value of a configuration parameter, it automatically falls back to
<code>default_settings</code> if it isn’t present in <code>user_settings</code>.</p>
<h3 id="the-counter">The <code>Counter</code><a class="headerlink" href="#the-counter" title="Permanent link">¶</a></h3>
<p><code>Counter</code> dictionaries can be used to keep counts of any (hashable) objects. The keys are these
hashable objects and the values are their counts. The <a href="https://docs.python.org/3/library/collections.html#collections.Counter" rel="noopener noreferrer" target="_blank">official docs</a> on this gives some
clever examples and uses so I recommend you go read this up there, instead of redoing it here.</p>
<h3 id="custom-flavor">Custom Flavor<a class="headerlink" href="#custom-flavor" title="Permanent link">¶</a></h3>
<p>Although rarely needed in practice, we can create our own flavors of dictionary types. One way to
achieve this would be to extend the <code>dict</code> type directly, but usually the easier way to deal with
this is to use the <a href="https://docs.python.org/3/library/collections.html#collections.UserDict" rel="noopener noreferrer" target="_blank"><code>UserDict</code></a> class.</p>
<p>Here’s an example dictionary type that works with string keys and is case-insensitive. A good use
for something like this is for HTTP headers. (The <a href="http://docs.python-requests.org/en/master/" rel="noopener noreferrer" target="_blank">requests</a> library does <a href="https://github.com/requests/requests/blob/master/requests/structures.py" rel="noopener noreferrer" target="_blank">something
similar</a>.)</p>
<div class="hl"><pre class=content><code><span><span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">UserDict</span>
</span><span>
</span><span>
</span><span><span class="k">class</span> <span class="nc">CaselessDict</span><span class="p">(</span><span class="n">UserDict</span><span class="p">):</span>
</span><span>
</span><span> <span class="k">def</span> <span class="fm">__getitem__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
</span><span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">data</span><span class="p">[</span><span class="n">name</span><span class="o">.</span><span class="n">lower</span><span class="p">()]</span>
</span><span>
</span><span> <span class="k">def</span> <span class="fm">__setitem__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
</span><span> <span class="bp">self</span><span class="o">.</span><span class="n">data</span><span class="p">[</span><span class="n">name</span><span class="o">.</span><span class="n">lower</span><span class="p">()]</span> <span class="o">=</span> <span class="n">value</span>
</span></code></pre></div>
<p>As seen above, the <code>UserDict</code> class provides a <code>.data</code> attribute that can be used as the underlying
store dictionary.</p>
<p>Let’s try it out.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">data</span> <span class="o">=</span> <span class="n">CaselessDict</span><span class="p">(</span><span class="n">accept</span><span class="o">=</span><span class="s1">'application/json'</span><span class="p">)</span>
</span><span><span class="gp">>>> </span><span class="n">data</span><span class="p">[</span><span class="s1">'accept'</span><span class="p">]</span>
</span><span><span class="go">'application/json'</span>
</span><span><span class="gp">>>> </span><span class="n">data</span><span class="p">[</span><span class="s1">'Accept'</span><span class="p">]</span>
</span><span><span class="go">'application/json'</span>
</span><span><span class="gp">>>> </span><span class="n">data</span><span class="p">[</span><span class="s1">'ACCEPT'</span><span class="p">]</span>
</span><span><span class="go">'application/json'</span>
</span></code></pre></div>
<h2 id="disassembling">Disassembling<a class="headerlink" href="#disassembling" title="Permanent link">¶</a></h2>
<p>Now, let’s disassemble a few common operations on dictionaries. I won’t be going into the details of
how to interpret the disassembled instructions in this article. We use the <a href="https://docs.python.org/3/library/dis.html#dis.dis" rel="noopener noreferrer" target="_blank"><code>dis</code></a> function
(from the aptly named <code>dis</code> module) for this.</p>
<p>Let’s try this a very simple function.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">dis</span><span class="o">.</span><span class="n">dis</span><span class="p">(</span><span class="k">lambda</span><span class="p">:</span> <span class="p">{</span><span class="s1">'a'</span><span class="p">:</span> <span class="mi">1</span><span class="p">})</span>
</span><span><span class="go"> 1 0 LOAD_CONST 1 ('a')</span>
</span><span><span class="go"> 2 LOAD_CONST 2 (1)</span>
</span><span><span class="go"> 4 BUILD_MAP 1</span>
</span><span><span class="go"> 6 RETURN_VALUE</span>
</span></code></pre></div>
<p>Here, we see the <a href="https://docs.python.org/3/library/dis.html#opcode-BUILD_MAP" rel="noopener noreferrer" target="_blank"><code>BUILD_MAP</code></a> opcode that takes a count which is the length of the
dictionary to build. From the official docs,</p>
<blockquote>
<p>Pushes a new dictionary object onto the stack. Pops <code>2 * count</code> items so that the dictionary holds
<em>count</em> entries: <code>{..., TOS3: TOS2, TOS1: TOS}</code>.</p>
</blockquote>
<p>Now let’s do this with two elements in the dict.</p>
<div class="hl"><pre class=content><code><span><span class="gp">>>> </span><span class="n">dis</span><span class="o">.</span><span class="n">dis</span><span class="p">(</span><span class="k">lambda</span><span class="p">:</span> <span class="p">{</span><span class="s1">'a'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="s1">'b'</span><span class="p">:</span> <span class="mi">2</span><span class="p">})</span>
</span><span><span class="go"> 1 0 LOAD_CONST 1 (1)</span>
</span><span><span class="go"> 2 LOAD_CONST 2 (2)</span>
</span><span><span class="go"> 4 LOAD_CONST 3 (('a', 'b'))</span>
</span><span><span class="go"> 6 BUILD_CONST_KEY_MAP 2</span>
</span><span><span class="go"> 8 RETURN_VALUE</span>
</span></code></pre></div>
<p>Here, we see a different opcode, <a href="https://docs.python.org/3/library/dis.html#opcode-BUILD_CONST_KEY_MAP" rel="noopener noreferrer" target="_blank"><code>BUILD_CONST_KEY_MAP</code></a> which also takes the
length of the dict as an argument. This is also explained best from the docs,</p>
<blockquote>
<p>The version of <code>BUILD_MAP</code> specialized for constant keys. <em>count</em> values are consumed from the
stack. The top element on the stack contains a tuple of keys.</p>
</blockquote>
<h2 id="conclusion">Conclusion<a class="headerlink" href="#conclusion" title="Permanent link">¶</a></h2>
<p>Dictionaries in Python (or any other language for that matter) are a very powerful multi-purpose
data structure and are extremely handy and easy to use in Python. I hoped to put the things I
learned about them in this article. If you see any inaccuracies or if there’s something that makes
for a good addition to this article, let me know in the comments below.</p>
<p>Thank you for reading. Please let me know what you think. If you have any topics you’d like me to
cover in a future article, put in a comment.</p>
<h2 id="references">References<a class="headerlink" href="#references" title="Permanent link">¶</a></h2>
<p>The official documentation, mostly. Wikipedia for data used in examples.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>I read the proof for this a long time ago, but I don’t remember where :). <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Migrate from Pelican to Hugo2017-08-23T00:00:00+05:302017-08-23T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2017-08-23:/posts/migrate-from-pelican-to-hugo/<p class="note"><strong>Update:</strong> I have now moved to using a self-made Python program that compiles my markdown article
documents into the website you see. I’m keeping this article as a journal of my then experience.</p>
<p>I recently got around to resurrecting my blog up after around five years of death. As …</p><p class="note"><strong>Update:</strong> I have now moved to using a self-made Python program that compiles my markdown article
documents into the website you see. I’m keeping this article as a journal of my then experience.</p>
<p>I recently got around to resurrecting my blog up after around five years of death. As part of that,
I chose to migrate my blog to Hugo, from the current Pelican builder. The first post after
resurrection will be about the migration.</p>
<p>If you’re wondering why the long break, well, I could blame it on life and work, but it was just me
being lazy. Hopefully, that won’t happen again.</p>
<h2 id="why-hugo">Why Hugo<a class="headerlink" href="#why-hugo" title="Permanent link">¶</a></h2>
<p>When I decided to start writing again, I couldn’t remember who I was building the site. That’s
probably entirely my fault for not documenting it for myself, but I ended up being almost new to
Pelican. So, instead of directly going to Pelican’s homepage, I checked out
<a href="https://www.staticgen.com/" rel="noopener noreferrer" target="_blank">StaticGen</a> to see the current landscape of static site generators. The
most popular (measure by GitHub stars) is obvious, Jekyll. Then came <a href="https://gohugo.io" rel="noopener noreferrer" target="_blank">Hugo</a>, a
name I didn’t recognize. Other than Pelican, all the ones in the top-ten are built on Ruby or
JavaScript (node.js). I wasn’t keen on either. Hugo was in a unique position since it is written in
a compiled language, so multiplatform binaries are relatively easy to come by.</p>
<p>I read the documentation on a weekend and I was impressed. Hugo it is. The thing that struck me most
in Hugo is that it does it’s primary thing only. Generating HTML files from Markdown files. It
doesn’t force a blog-like website or a documentation-like website. That’s up to you. Hugo is like a
bridge between your markdown files and the output HTML files. The structure of the output is a
mirror image of your source files and the <code>config.toml</code> file (or <code>config.yaml</code>).</p>
<h2 id="migration">Migration<a class="headerlink" href="#migration" title="Permanent link">¶</a></h2>
<h3 id="a-new-site">A new site<a class="headerlink" href="#a-new-site" title="Permanent link">¶</a></h3>
<p>Issued the command <code>hugo new site sharats.me</code>.</p>
<h3 id="configuration">Configuration<a class="headerlink" href="#configuration" title="Permanent link">¶</a></h3>
<p>Hugo’s default configuration is of the <a href="https://github.com/toml-lang/toml" rel="noopener noreferrer" target="_blank">TOML</a> format. I read the
README and wasn’t convinced. Thankfully, Hugo supports configuration in <a href="http://yaml.org/" rel="noopener noreferrer" target="_blank">YAML</a>.</p>
<p>So, this is what I came up with in my <code>config.yaml</code> file.</p>
<div class="hl"><pre class=content><code><span><span class="nt">baseURL</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">http://sharats.me/</span>
</span><span><span class="nt">languageCode</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">en-us</span>
</span><span><span class="nt">title</span><span class="p">:</span><span class="w"> </span><span class="s">"The</span><span class="nv"> </span><span class="s">Sharat's"</span>
</span></code></pre></div>
<p>The current <code>config.yaml</code> is much longer and can be viewed on the github repo of this site.</p>
<h3 id="change-metadata-format">Change metadata format<a class="headerlink" href="#change-metadata-format" title="Permanent link">¶</a></h3>
<p>The article metadata in my Pelican site looks like the following:</p>
<div class="hl"><pre class=content><code><span><span class="nt">Title</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Serializing python-requests' Session objects for fun and profit</span>
</span><span><span class="nt">Date</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">18.2.2012</span>
</span><span><span class="nt">Tags</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">python, python-requests, python-pickle</span>
</span><span><span class="nt">Reddit</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
</span></code></pre></div>
<p>There’s a lot of things in this that I wouldn’t do if I wrote that article today, but meh.</p>
<p>Hugo calls these <em>frontmatter</em> and I needed it to look like the following to make it happy.</p>
<div class="hl"><pre class=content><code><span><span class="nn">---</span>
</span><span><span class="nt">title</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Serializing python-requests' Session objects for fun and profit</span>
</span><span><span class="nt">date</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">2012-02-18</span>
</span><span><span class="nt">tags</span><span class="p">:</span><span class="w"> </span><span class="s">'python'</span><span class="err">,</span><span class="w"> </span><span class="s">'python-requests'</span><span class="err">,</span><span class="w"> </span><span class="s">'python-pickle'</span>
</span><span><span class="nt">reddit</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
</span><span><span class="nn">---</span>
</span></code></pre></div>
<p>The following <code>awk</code> script did the trick:</p>
<div class="hl"><pre class=content><code><span><span class="nb">BEGIN</span> <span class="p">{</span> <span class="nb">FS</span> <span class="o">=</span> <span class="s2">":"</span><span class="p">;</span> <span class="nb">OFS</span> <span class="o">=</span> <span class="s2">":"</span><span class="p">;</span> <span class="kr">print</span> <span class="s2">"---"</span> <span class="p">}</span>
</span><span>
</span><span><span class="o">!</span><span class="nx">c</span> <span class="o">&&</span> <span class="sr">/^$/</span> <span class="p">{</span> <span class="kr">print</span> <span class="s2">"---\n"</span><span class="p">;</span> <span class="nx">c</span> <span class="o">=</span> <span class="mi">1</span> <span class="p">}</span>
</span><span>
</span><span><span class="nx">c</span> <span class="p">{</span> <span class="kr">print</span><span class="p">;</span> <span class="kr">next</span> <span class="p">}</span>
</span><span>
</span><span><span class="o">!</span><span class="nx">c</span> <span class="p">{</span>
</span><span> <span class="o">$</span><span class="mi">1</span> <span class="o">=</span> <span class="kr">tolower</span><span class="p">(</span><span class="o">$</span><span class="mi">1</span><span class="p">)</span>
</span><span>
</span><span> <span class="k">if</span> <span class="p">(</span><span class="o">$</span><span class="mi">1</span> <span class="o">==</span> <span class="s2">"date"</span><span class="p">)</span> <span class="p">{</span>
</span><span> <span class="o">$</span><span class="mi">2</span> <span class="o">=</span> <span class="kr">gensub</span><span class="p">(</span><span class="sr">/ ([^.]+)\.([^.]+).([^.]+)/</span><span class="p">,</span> <span class="s2">" \\3-\\2-\\1"</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="o">$</span><span class="mi">2</span><span class="p">)</span>
</span><span> <span class="o">$</span><span class="mi">2</span> <span class="o">=</span> <span class="kr">gensub</span><span class="p">(</span><span class="sr">/-([0-9])-/</span><span class="p">,</span> <span class="s2">"-0\\1-"</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="o">$</span><span class="mi">2</span><span class="p">)</span>
</span><span> <span class="p">}</span>
</span><span>
</span><span> <span class="k">if</span> <span class="p">(</span><span class="o">$</span><span class="mi">1</span> <span class="o">==</span> <span class="s2">"tags"</span><span class="p">)</span>
</span><span> <span class="o">$</span><span class="mi">2</span> <span class="o">=</span> <span class="s2">" ["</span> <span class="kr">gensub</span><span class="p">(</span><span class="sr">/[-a-z]+/</span><span class="p">,</span> <span class="s2">"'\\0'"</span><span class="p">,</span> <span class="s2">"g"</span><span class="p">,</span> <span class="kr">substr</span><span class="p">(</span><span class="o">$</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">))</span> <span class="s2">"]"</span>
</span><span>
</span><span> <span class="kr">print</span>
</span><span><span class="p">}</span>
</span></code></pre></div>
<h3 id="change-code-blocks">Change code blocks<a class="headerlink" href="#change-code-blocks" title="Permanent link">¶</a></h3>
<p>All my code blocks were of the following format:</p>
<div class="hl"><pre class=content><code><span> :::python
</span><span> import this
</span></code></pre></div>
<p>But, I needed them like this:</p>
<div class="hl"><pre class=content><code><span>```python
</span><span>import this
</span><span>```
</span></code></pre></div>
<p>So, the following little python script did the trick:</p>
<div class="hl"><input type=checkbox id=co-7><label for=co-7><span class='btn show-full-code-btn'>Show remaining 17 lines</span></label><pre class=linenos><span>1
</span><span>2
</span><span>3
</span><span>4
</span><span>5
</span><span>6
</span><span>7
</span><span>8
</span><span>9
</span><span>10
</span><span>11
</span><span>12
</span><span>13
</span><span>14
</span><span>15
</span><span>16
</span><span>17
</span><span>18
</span><span>19
</span><span>20
</span><span class=collapse>21
</span><span class=collapse>22
</span><span class=collapse>23
</span><span class=collapse>24
</span><span class=collapse>25
</span><span class=collapse>26
</span><span class=collapse>27
</span><span class=collapse>28
</span><span class=collapse>29
</span><span class=collapse>30
</span><span class=collapse>31
</span><span class=collapse>32
</span><span class=collapse>33
</span><span class=collapse>34
</span><span class=collapse>35
</span><span class=collapse>36
</span><span class=collapse>37
</span></pre><pre class=content><code><span><span class="ch">#!/usr/bin/env python3</span>
</span><span>
</span><span><span class="kn">import</span> <span class="nn">sys</span>
</span><span>
</span><span>
</span><span><span class="k">def</span> <span class="nf">process</span><span class="p">(</span><span class="n">f</span><span class="p">):</span>
</span><span> <span class="n">cb</span> <span class="o">=</span> <span class="kc">False</span>
</span><span> <span class="n">empties</span> <span class="o">=</span> <span class="mi">0</span>
</span><span> <span class="n">output</span> <span class="o">=</span> <span class="p">[]</span>
</span><span> <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">f</span><span class="p">:</span>
</span><span> <span class="n">line</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="n">rstrip</span><span class="p">(</span><span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
</span><span>
</span><span> <span class="k">if</span> <span class="ow">not</span> <span class="n">line</span><span class="p">:</span>
</span><span> <span class="n">empties</span> <span class="o">+=</span> <span class="mi">1</span>
</span><span> <span class="k">continue</span>
</span><span>
</span><span> <span class="n">prefix</span> <span class="o">=</span> <span class="s1">''</span>
</span><span> <span class="k">if</span> <span class="n">line</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s1">' '</span><span class="p">):</span>
</span><span> <span class="n">line</span> <span class="o">=</span> <span class="n">line</span><span class="p">[</span><span class="mi">4</span><span class="p">:]</span>
</span><span> <span class="k">if</span> <span class="ow">not</span> <span class="n">cb</span><span class="p">:</span>
</span><span class=collapse> <span class="n">cb</span> <span class="o">=</span> <span class="kc">True</span>
</span><span class=collapse> <span class="n">line</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s1">':::'</span><span class="p">,</span> <span class="s1">'```'</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="k">if</span> <span class="n">line</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s1">':::'</span><span class="p">)</span> <span class="k">else</span> <span class="p">(</span><span class="s1">'```</span><span class="se">\n</span><span class="s1">'</span> <span class="o">+</span> <span class="n">line</span><span class="p">)</span>
</span><span class=collapse>
</span><span class=collapse> <span class="k">elif</span> <span class="n">cb</span><span class="p">:</span>
</span><span class=collapse> <span class="n">cb</span> <span class="o">=</span> <span class="kc">False</span>
</span><span class=collapse> <span class="n">prefix</span> <span class="o">=</span> <span class="s1">'```</span><span class="se">\n</span><span class="s1">'</span>
</span><span class=collapse>
</span><span class=collapse> <span class="n">output</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">prefix</span> <span class="o">+</span> <span class="s1">'</span><span class="se">\n</span><span class="s1">'</span> <span class="o">*</span> <span class="n">empties</span> <span class="o">+</span> <span class="n">line</span><span class="p">)</span>
</span><span class=collapse> <span class="n">empties</span> <span class="o">=</span> <span class="mi">0</span>
</span><span class=collapse>
</span><span class=collapse> <span class="k">return</span> <span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">output</span><span class="p">)</span>
</span><span class=collapse>
</span><span class=collapse>
</span><span class=collapse><span class="k">for</span> <span class="n">file_name</span> <span class="ow">in</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">:]:</span>
</span><span class=collapse> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">file_name</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
</span><span class=collapse> <span class="n">output</span> <span class="o">=</span> <span class="n">process</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
</span><span class=collapse> <span class="nb">print</span><span class="p">(</span><span class="n">output</span><span class="p">)</span>
</span></code></pre></div>
<p>Yeah, didn’t have the patience to do it with <code>awk</code> this time.</p>
<h2 id="the-theme">The Theme<a class="headerlink" href="#the-theme" title="Permanent link">¶</a></h2>
<p>I tried the themes over at the <a href="http://themes.gohugo.io/" rel="noopener noreferrer" target="_blank">Hugo themes page</a>, but just as I thought,
none of them were to my liking. I found the <strong>nofancy</strong> theme to be easy to get started and modify
to what I want, so that’s what happened. Hugo’s documentation is very good. I have to say, the
documentation is one of the reasons I’m loving Hugo.</p>
<p>Hope to be writing more articles in the coming weeks.</p>The ever useful and neat subprocess module2012-04-29T00:00:00+05:302012-04-29T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2012-04-29:/posts/the-ever-useful-and-neat-subprocess-module/<p>Python’s <a href="http://docs.python.org/library/subprocess.html" rel="noopener noreferrer" target="_blank">subprocess</a> module is one of my favourite modules in the standard library. If you have
ever done some decent amount of coding in python, you might have encountered it. This module is used
for dealing with external commands, intended to be a replacement to the old <a href="http://docs.python.org/library/os.html#os.system" rel="noopener noreferrer" target="_blank"><code>os.system …</code></a></p><p>Python’s <a href="http://docs.python.org/library/subprocess.html" rel="noopener noreferrer" target="_blank">subprocess</a> module is one of my favourite modules in the standard library. If you have
ever done some decent amount of coding in python, you might have encountered it. This module is used
for dealing with external commands, intended to be a replacement to the old <a href="http://docs.python.org/library/os.html#os.system" rel="noopener noreferrer" target="_blank"><code>os.system</code></a>
and the like.</p>
<p>The most trivial use might be to get the output of a small shell command like <code>ls</code> or <code>ps</code>. Not that
this is the best way to get a list of files in a directory (think <a href="http://docs.python.org/library/os.html#os.listdir" rel="noopener noreferrer" target="_blank"><code>os.listdir</code></a>), but you
get the point.</p>
<p>I am going to put my notes and experiences about this module here. Please note, I wrote this with
Python 2.7 in mind. Things <strong>are</strong> slightly different in other versions (even 2.6). If you find any
errors or suggestions, please let me know.</p>
<div class="toc"><span class="toctitle">Table of Contents</span><ul>
<li><a href="#a-simple-usage">A simple usage</a></li>
<li><a href="#popen-class">Popen class</a></li>
<li><a href="#running-via-the-shell">Running via the shell</a></li>
<li><a href="#getting-the-return-code-aka-exit-status">Getting the return code (aka exit status)</a></li>
<li><a href="#io-streams">IO Streams</a><ul>
<li><a href="#reading-error-stream">Reading error stream</a></li>
<li><a href="#watching-both-stdout-and-stderr">Watching both stdout and stderr</a></li>
</ul>
</li>
<li><a href="#passing-an-environment">Passing an environment</a><ul>
<li><a href="#merge-with-current-environment">Merge with current environment</a></li>
<li><a href="#unicode">Unicode</a></li>
</ul>
</li>
<li><a href="#execute-in-a-different-working-directory">Execute in a different working directory</a></li>
<li><a href="#killing-and-dying">Killing and dying</a><ul>
<li><a href="#auto-kill-on-death">Auto-kill on death</a></li>
</ul>
</li>
<li><a href="#launch-commands-in-a-terminal-emulator">Launch commands in a terminal emulator</a><ul>
<li><a href="#linux">Linux</a></li>
<li><a href="#windows">Windows</a></li>
</ul>
</li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
</div>
<h2 id="a-simple-usage">A simple usage<a class="headerlink" href="#a-simple-usage" title="Permanent link">¶</a></h2>
<p>For the sake of providing context, lets run the <code>ls</code> command from subprocess and get its output</p>
<div class="hl"><pre class=content><code><span><span class="kn">import</span> <span class="nn">subprocess</span>
</span><span><span class="n">ls_output</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">check_output</span><span class="p">([</span><span class="s1">'ls'</span><span class="p">])</span>
</span></code></pre></div>
<p>I’ll cover getting output from a command in detail later. To give more command line arguments,</p>
<div class="hl"><pre class=content><code><span><span class="n">subprocess</span><span class="o">.</span><span class="n">check_output</span><span class="p">([</span><span class="s1">'ls'</span><span class="p">,</span> <span class="s1">'-l'</span><span class="p">])</span>
</span></code></pre></div>
<p>The first item in the list is the executable and rest are its command line arguments (<code>argv</code>
equivalent). No quirky shell quoting and complex nested quote rules to digest. Just a plain python
list.</p>
<p>However, not having shell quoting implies you don’t also have the shell niceties. Like piping for
one. The following won’t work the way one would expect it to.</p>
<div class="hl"><pre class=content><code><span><span class="n">subprocess</span><span class="o">.</span><span class="n">check_output</span><span class="p">([</span><span class="s1">'ls'</span><span class="p">,</span> <span class="s1">'|'</span><span class="p">,</span> <span class="s1">'wc'</span><span class="p">,</span> <span class="s1">'-l'</span><span class="p">])</span>
</span></code></pre></div>
<p>Here, the <code>ls</code> command gets its first command as <code>|</code> and I have no idea what ls would do with it.
Perhaps complain that no such file exists. So, instead, we have to use the <code>shell</code> boolean argument.
More later down in the article.</p>
<h2 id="popen-class">Popen class<a class="headerlink" href="#popen-class" title="Permanent link">¶</a></h2>
<p>If there’s just one thing in the subprocess module that you should be concerned with, its the
<a href="http://docs.python.org/library/subprocess.html#subprocess.Popen" rel="noopener noreferrer" target="_blank"><code>Popen</code></a> class. The other functions like <a href="http://docs.python.org/library/subprocess.html#subprocess.call" rel="noopener noreferrer" target="_blank"><code>call</code></a>, <a href="http://docs.python.org/library/subprocess.html#subprocess.check_call" rel="noopener noreferrer" target="_blank"><code>check_output</code></a>, and
<a href="http://docs.python.org/library/subprocess.html#subprocess.check_call" rel="noopener noreferrer" target="_blank"><code>check_call</code></a> use <code>Popen</code> internally. Here’s the signature from the docs.</p>
<div class="hl"><pre class=content><code><span><span class="k">class</span> <span class="nc">subprocess</span><span class="o">.</span><span class="n">Popen</span><span class="p">(</span><span class="n">args</span><span class="p">,</span> <span class="n">bufsize</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">executable</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">stdin</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
</span><span> <span class="n">stdout</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">stderr</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">preexec_fn</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">close_fds</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">shell</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
</span><span> <span class="n">cwd</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">env</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">universal_newlines</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">startupinfo</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
</span><span> <span class="n">creationflags</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
</span></code></pre></div>
<p>I suggest you read the docs for this class. As with all python docs, its really good.</p>
<h2 id="running-via-the-shell">Running via the shell<a class="headerlink" href="#running-via-the-shell" title="Permanent link">¶</a></h2>
<p>Subprocess can also run command-line instructions via a shell program. This is usually <code>dash</code>/<code>bash</code>
on Linux and <code>cmd</code> on windows.</p>
<div class="hl"><pre class=content><code><span><span class="n">subprocess</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="s1">'ls | wc -l'</span><span class="p">,</span> <span class="n">shell</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></code></pre></div>
<p>Notice that in this case we pass a string, not a list. This is because we want the shell to
interpret the whole of our command. You can even use shell style quoting if you like. It is up to
the shell to decide how to best split the command line into executable and command line arguments.</p>
<blockquote>
<p>On windows, if you pass a list for args, it will be turned into a string using the same rules as
the MS C runtime. See the doc-string for <code>subprocess.list2cmdline</code> for more on this. Whereas on
unix-like systems, even if you pass a string, its turned into a list of one item :).</p>
</blockquote>
<p>The behaviour of the <code>shell</code> argument can sometimes be confusing so I’ll try to clear it a bit here.
Something I wished I had when I first encountered this module.</p>
<p>Firstly, lets consider the case where <code>shell</code> is set to <code>False</code>, the default. In this case, if
<code>args</code> is a string, it is assumed to be the name of the executable file. Even if it contains spaces.
Consider the following.</p>
<div class="hl"><pre class=content><code><span><span class="n">subprocess</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="s1">'ls -l'</span><span class="p">)</span>
</span></code></pre></div>
<p>This won’t work because subprocess is looking for an executable file called <code>ls -l</code>, but obviously
can’t find it. However, if <code>args</code> is a list, then the first item in this list is considered as the
executable and the rest of the items in the list are passed as command line arguments to the
program.</p>
<div class="hl"><pre class=content><code><span><span class="n">subprocess</span><span class="o">.</span><span class="n">call</span><span class="p">([</span><span class="s1">'ls'</span><span class="p">,</span> <span class="s1">'-l'</span><span class="p">])</span>
</span></code></pre></div>
<p>does what you think it will.</p>
<p>Second case, with <code>shell</code> set to <code>True</code>, the program that actually gets executed is the OS default
shell, <code>/bin/sh</code> on Linux and <code>cmd.exe</code> on windows. This can be changed with the <code>executable</code>
argument.</p>
<p>When using the shell, <code>args</code> is usually a string, something that will be parsed by the shell
program. The <code>args</code> string is passed as a command line argument to the shell (with a <code>-c</code> option on
Linux) such that the shell will interpret it as a shell command sequence and process it accordingly.
This means you can use all the shell builtins and goodies that your shell offers.</p>
<div class="hl"><pre class=content><code><span><span class="n">subprocess</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="s1">'ls -l'</span><span class="p">,</span> <span class="n">shell</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></code></pre></div>
<p>is similar to</p>
<div class="hl"><pre class=content><code><span>$<span class="w"> </span>/bin/sh<span class="w"> </span>-c<span class="w"> </span><span class="s1">'ls -l'</span>
</span></code></pre></div>
<p>In the same vein, if you pass a list as <code>args</code> with <code>shell</code> set to <code>True</code>, all items in the list are
passed as command line arguments to the shell.</p>
<div class="hl"><pre class=content><code><span><span class="n">subprocess</span><span class="o">.</span><span class="n">call</span><span class="p">([</span><span class="s1">'ls'</span><span class="p">,</span> <span class="s1">'-l'</span><span class="p">],</span> <span class="n">shell</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></code></pre></div>
<p>is similar to</p>
<div class="hl"><pre class=content><code><span>$<span class="w"> </span>/bin/sh<span class="w"> </span>-c<span class="w"> </span>ls<span class="w"> </span>-l
</span></code></pre></div>
<p>which is the same as</p>
<div class="hl"><pre class=content><code><span>$<span class="w"> </span>/bin/sh<span class="w"> </span>-c<span class="w"> </span>ls
</span></code></pre></div>
<p>since <code>/bin/sh</code> takes just the argument next to <code>-c</code> as the command line to execute.</p>
<h2 id="getting-the-return-code-aka-exit-status">Getting the return code (aka exit status)<a class="headerlink" href="#getting-the-return-code-aka-exit-status" title="Permanent link">¶</a></h2>
<p>If you want to run an external command and its return code is all you’re concerned with, the
<a href="http://docs.python.org/library/subprocess.html#subprocess.call" rel="noopener noreferrer" target="_blank"><code>call</code></a> and <a href="http://docs.python.org/library/subprocess.html#subprocess.check_call" rel="noopener noreferrer" target="_blank"><code>check_call</code></a> functions are what you’re looking for. They both
return the return code after running the command. The difference is, <code>check_call</code> raises a
<code>CalledProcessError</code> if the return code is non-zero.</p>
<p>If you’ve read the docs for these functions, you’ll see that its not recommended to use
<code>stdout=PIPE</code> or <code>stderr=PIPE</code>. And if you don’t, the <code>stdout</code> and <code>stderr</code> of the command are just
redirected to the parent’s (Python VM in this case) streams.</p>
<p>If that is not what you want, you have to use the <code>Popen</code> class.</p>
<div class="hl"><pre class=content><code><span><span class="n">proc</span> <span class="o">=</span> <span class="n">Popen</span><span class="p">(</span><span class="s1">'ls'</span><span class="p">)</span>
</span></code></pre></div>
<p>The moment the <code>Popen</code> class is instantiated, the command starts running. You can wait for it and
after its done, access the return code via the <a href="http://docs.python.org/library/subprocess.html#subprocess.Popen.returncode" rel="noopener noreferrer" target="_blank"><code>returncode</code></a> attribute.</p>
<div class="hl"><pre class=content><code><span><span class="n">proc</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span>
</span><span><span class="nb">print</span> <span class="n">proc</span><span class="o">.</span><span class="n">returncode</span>
</span></code></pre></div>
<p>If you are trying this out in a python REPL, you won’t see a need to call <a href="http://docs.python.org/library/subprocess.html#subprocess.Popen.wait" rel="noopener noreferrer" target="_blank"><code>.wait()</code></a> since
you can just wait yourself in the REPL till the command is finished and then access the
<code>returncode</code>. Surprise!</p>
<div class="hl"><pre class=content><code><span><span class="o">>>></span> <span class="n">proc</span> <span class="o">=</span> <span class="n">Popen</span><span class="p">(</span><span class="s1">'ls'</span><span class="p">)</span>
</span><span><span class="o">>>></span> <span class="n">file1</span> <span class="n">file2</span>
</span><span>
</span><span><span class="o">>>></span> <span class="nb">print</span> <span class="n">proc</span><span class="o">.</span><span class="n">returncode</span>
</span><span><span class="kc">None</span>
</span><span><span class="o">>>></span> <span class="c1"># wat?</span>
</span></code></pre></div>
<p>The command is definitely finished. Why don’t we have a return code?</p>
<div class="hl"><pre class=content><code><span><span class="o">>>></span> <span class="n">proc</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span>
</span><span><span class="mi">0</span>
</span><span><span class="o">>>></span> <span class="nb">print</span> <span class="n">proc</span><span class="o">.</span><span class="n">returncode</span>
</span><span><span class="mi">0</span>
</span></code></pre></div>
<p>The reason for this is the <code>returncode</code> is not automatically set when a process ends. You have to
call <code>.wait</code> or <a href="http://docs.python.org/library/subprocess.html#subprocess.Popen.poll" rel="noopener noreferrer" target="_blank"><code>.poll</code></a> to realize if the program is done and set the <code>returncode</code>
attribute.</p>
<h2 id="io-streams">IO Streams<a class="headerlink" href="#io-streams" title="Permanent link">¶</a></h2>
<p>The simplest way to get the output of a command, as seen previously, is to use the
<a href="http://docs.python.org/library/subprocess.html#subprocess.check_call" rel="noopener noreferrer" target="_blank"><code>check_output</code></a> function.</p>
<div class="hl"><pre class=content><code><span><span class="n">output</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">check_output</span><span class="p">(</span><span class="s1">'ls'</span><span class="p">)</span>
</span></code></pre></div>
<p>Notice the <code>check_</code> prefix in the function name? Ring any bell? That’s right, this function will
raise a <code>CalledProcessError</code> if the return code is non-zero.</p>
<p>This may not always be the best solution to get the output from a command. If you do get a
<code>CalledProcessError</code> from this function call, unless you have the contents of <code>stderr</code> you probably
have little idea what went wrong. You’ll want to know what’s written to the command’s <code>stderr</code>.</p>
<h3 id="reading-error-stream">Reading error stream<a class="headerlink" href="#reading-error-stream" title="Permanent link">¶</a></h3>
<p>There are two ways to get the error output. First is redirecting <code>stderr</code> to <code>stdout</code> and only being
concerned with <code>stdout</code>. This can be done by setting the <code>stderr</code> argument to
<a href="http://docs.python.org/library/subprocess.html#subprocess.STDOUT" rel="noopener noreferrer" target="_blank"><code>subprocess.STDOUT</code></a>.</p>
<p>Second is to create a <code>Popen</code> object with <code>stderr</code> set to <a href="http://docs.python.org/library/subprocess.html#subprocess.PIPE" rel="noopener noreferrer" target="_blank"><code>subprocess.PIPE</code></a> (optionally
along with <code>stdout</code> argument) and read from its <code>stderr</code> attribute which is a readable file-like
object. There is also a convenience method on <code>Popen</code> class, called <code>.communicate</code>, which optionally
takes a string to be sent to the process’s <code>stdin</code> and returns a tuple of <code>(stdout_content, stderr_content)</code>.</p>
<h3 id="watching-both-stdout-and-stderr">Watching both <code>stdout</code> and <code>stderr</code><a class="headerlink" href="#watching-both-stdout-and-stderr" title="Permanent link">¶</a></h3>
<p>However, all of these assume that the command runs for some time, prints out a couple of lines of
output and exits, so you can get the output(s) in strings. This is sometimes not the case. If you
want to run a network intensive command like an svn checkout, which prints each file as and when
downloaded, you need something better.</p>
<p>The initial solution one can think of is this.</p>
<div class="hl"><pre class=content><code><span><span class="n">proc</span> <span class="o">=</span> <span class="n">Popen</span><span class="p">(</span><span class="s1">'svn co svn+ssh://myrepo'</span><span class="p">,</span> <span class="n">stdout</span><span class="o">=</span><span class="n">PIPE</span><span class="p">)</span>
</span><span><span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">proc</span><span class="o">.</span><span class="n">stdout</span><span class="p">:</span>
</span><span> <span class="nb">print</span> <span class="n">line</span>
</span></code></pre></div>
<p>This works, for the most part. But, again, if there is an error, you’ll want to read <code>stderr</code> too.
It would be nice to read <code>stdout</code> and <code>stderr</code> simultaneously. Just like a shell seems to be doing.
Alas, this remains a not so straightforward problem as of today, at least on non-Linux systems.</p>
<p>On Linux (and where its supported), you can use the <a href="http://docs.python.org/library/select.html" rel="noopener noreferrer" target="_blank"><code>select</code></a> module to keep an eye on
multiple file-like stream objects. But this isn’t available on windows. A more platform independent
solution that I found works well, is using threads and a <a href="http://docs.python.org/library/queue.html#queue-objects" rel="noopener noreferrer" target="_blank"><code>Queue</code></a>.</p>
<div class="hl"><input type=checkbox id=co-4><label for=co-4><span class='btn show-full-code-btn'>Show remaining 15 lines</span></label><pre class=content><code><span><span class="kn">from</span> <span class="nn">subprocess</span> <span class="kn">import</span> <span class="n">Popen</span><span class="p">,</span> <span class="n">PIPE</span>
</span><span><span class="kn">from</span> <span class="nn">threading</span> <span class="kn">import</span> <span class="n">Thread</span>
</span><span><span class="kn">from</span> <span class="nn">Queue</span> <span class="kn">import</span> <span class="n">Queue</span><span class="p">,</span> <span class="n">Empty</span>
</span><span>
</span><span><span class="n">io_q</span> <span class="o">=</span> <span class="n">Queue</span><span class="p">()</span>
</span><span>
</span><span><span class="k">def</span> <span class="nf">stream_watcher</span><span class="p">(</span><span class="n">identifier</span><span class="p">,</span> <span class="n">stream</span><span class="p">):</span>
</span><span>
</span><span> <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">stream</span><span class="p">:</span>
</span><span> <span class="n">io_q</span><span class="o">.</span><span class="n">put</span><span class="p">((</span><span class="n">identifier</span><span class="p">,</span> <span class="n">line</span><span class="p">))</span>
</span><span>
</span><span> <span class="k">if</span> <span class="ow">not</span> <span class="n">stream</span><span class="o">.</span><span class="n">closed</span><span class="p">:</span>
</span><span> <span class="n">stream</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</span><span>
</span><span><span class="n">proc</span> <span class="o">=</span> <span class="n">Popen</span><span class="p">(</span><span class="s1">'svn co svn+ssh://myrepo'</span><span class="p">,</span> <span class="n">stdout</span><span class="o">=</span><span class="n">PIPE</span><span class="p">,</span> <span class="n">stderr</span><span class="o">=</span><span class="n">PIPE</span><span class="p">)</span>
</span><span>
</span><span><span class="n">Thread</span><span class="p">(</span><span class="n">target</span><span class="o">=</span><span class="n">stream_watcher</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">'stdout-watcher'</span><span class="p">,</span>
</span><span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="s1">'STDOUT'</span><span class="p">,</span> <span class="n">proc</span><span class="o">.</span><span class="n">stdout</span><span class="p">))</span><span class="o">.</span><span class="n">start</span><span class="p">()</span>
</span><span><span class="n">Thread</span><span class="p">(</span><span class="n">target</span><span class="o">=</span><span class="n">stream_watcher</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">'stderr-watcher'</span><span class="p">,</span>
</span><span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="s1">'STDERR'</span><span class="p">,</span> <span class="n">proc</span><span class="o">.</span><span class="n">stderr</span><span class="p">))</span><span class="o">.</span><span class="n">start</span><span class="p">()</span>
</span><span class=collapse>
</span><span class=collapse><span class="k">def</span> <span class="nf">printer</span><span class="p">():</span>
</span><span class=collapse> <span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
</span><span class=collapse> <span class="k">try</span><span class="p">:</span>
</span><span class=collapse> <span class="c1"># Block for 1 second.</span>
</span><span class=collapse> <span class="n">item</span> <span class="o">=</span> <span class="n">io_q</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="kc">True</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
</span><span class=collapse> <span class="k">except</span> <span class="n">Empty</span><span class="p">:</span>
</span><span class=collapse> <span class="c1"># No output in either streams for a second. Are we done?</span>
</span><span class=collapse> <span class="k">if</span> <span class="n">proc</span><span class="o">.</span><span class="n">poll</span><span class="p">()</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
</span><span class=collapse> <span class="k">break</span>
</span><span class=collapse> <span class="k">else</span><span class="p">:</span>
</span><span class=collapse> <span class="n">identifier</span><span class="p">,</span> <span class="n">line</span> <span class="o">=</span> <span class="n">item</span>
</span><span class=collapse> <span class="nb">print</span> <span class="n">identifier</span> <span class="o">+</span> <span class="s1">':'</span><span class="p">,</span> <span class="n">line</span>
</span><span class=collapse>
</span><span class=collapse><span class="n">Thread</span><span class="p">(</span><span class="n">target</span><span class="o">=</span><span class="n">printer</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">'printer'</span><span class="p">)</span><span class="o">.</span><span class="n">start</span><span class="p">()</span>
</span></code></pre></div>
<p>Fair bit of code. This is a typical producer-consumer thing. Two threads producing lines of output
(one each from <code>stdout</code> and <code>stderr</code>) and pushing them into a queue. One thread watching the queue
and printing the lines until the process itself finishes.</p>
<h2 id="passing-an-environment">Passing an environment<a class="headerlink" href="#passing-an-environment" title="Permanent link">¶</a></h2>
<p>The <code>env</code> argument to <code>Popen</code> (and others) lets you customize the environment of the command being
run. If it is not set, or is set to <code>None</code>, the current process’s environment is used, just as
documented.</p>
<p>You might not agree with me, but I feel there are some subtleties with this argument that should
have been mentioned in the documentation.</p>
<h3 id="merge-with-current-environment">Merge with current environment<a class="headerlink" href="#merge-with-current-environment" title="Permanent link">¶</a></h3>
<p>One is that if you provide a mapping to <code>env</code>, whatever is in this mapping is all that’s available
to the command being run. For example, if you don’t give a <code>TOP_ARG</code> in the <code>env</code> mapping, the
command won’t see a <code>TOP_ARG</code> in its environment. So, I frequently find myself doing this</p>
<div class="hl"><pre class=content><code><span><span class="n">p</span> <span class="o">=</span> <span class="n">Popen</span><span class="p">(</span><span class="s1">'command'</span><span class="p">,</span> <span class="n">env</span><span class="o">=</span><span class="nb">dict</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="p">,</span> <span class="n">my_env_prop</span><span class="o">=</span><span class="s1">'value'</span><span class="p">))</span>
</span></code></pre></div>
<p>This makes sense once you realize it, but I wish it were at least <em>hinted at</em> in the documentation.</p>
<h3 id="unicode">Unicode<a class="headerlink" href="#unicode" title="Permanent link">¶</a></h3>
<p>Another one, is to do with Unicode (Surprise surprise!). And windows. If you use <code>unicode</code>s in the
<code>env</code> mapping, you get an error saying you can <em>only</em> use strings in the environment mapping. The
worst part about this error is that it only seems to happen on windows and not on Linux. If its an
error to use <code>unicode</code>s in this place, I wish it break on both platforms.</p>
<p>This issue is very painful if you’re like me and use <code>unicode</code> <em>all the time</em>.</p>
<div class="hl"><pre class=content><code><span><span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">unicode_literals</span>
</span></code></pre></div>
<p>That line is present in all my python source files. The error message doesn’t even bother to mention
that you have <code>unicode</code>s in your <code>env</code> so it’s very hard to understand what’s going wrong.</p>
<h2 id="execute-in-a-different-working-directory">Execute in a different working directory<a class="headerlink" href="#execute-in-a-different-working-directory" title="Permanent link">¶</a></h2>
<p>This is handled by the <code>cwd</code> argument. You set the location of the directory which you want as the
working directory of the program you are launching.</p>
<p>The docs do mention that the working directory is changed <em>before</em> the command even starts running.
But that you <em>can’t</em> specify program’s path relative to the <code>cwd</code>. In reality, I found that you
<em>can</em> do this.</p>
<p>Either I’m missing something with this or the docs really are inaccurate. Anyway, this works</p>
<div class="hl"><pre class=content><code><span><span class="n">subprocess</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="s1">'./ls'</span><span class="p">,</span> <span class="n">cwd</span><span class="o">=</span><span class="s1">'/bin'</span><span class="p">)</span>
</span></code></pre></div>
<p>Prints out all the files in <code>/bin</code>. Of course, the following doesn’t work when the working directory
is not <code>/bin</code>.</p>
<div class="hl"><pre class=content><code><span><span class="n">subprocess</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="s1">'./ls'</span><span class="p">)</span>
</span></code></pre></div>
<p>So, if you are giving something explicitly to <code>cwd</code> and are using a relative path for the
executable, this is something to keep in mind.</p>
<h2 id="killing-and-dying">Killing and dying<a class="headerlink" href="#killing-and-dying" title="Permanent link">¶</a></h2>
<p>A simple</p>
<div class="hl"><pre class=content><code><span><span class="n">proc</span><span class="o">.</span><span class="n">terminate</span><span class="p">()</span>
</span></code></pre></div>
<p>Or for some dramatic umphh!</p>
<div class="hl"><pre class=content><code><span><span class="n">proc</span><span class="o">.</span><span class="n">kill</span><span class="p">()</span>
</span></code></pre></div>
<p>Will do the trick to end the process. As noted in the documentation, the former sends a <code>SIGTERM</code>
and later sends a <code>SIGKILL</code> on unix, but both do some native windows-y thing on windows.</p>
<h3 id="auto-kill-on-death">Auto-kill on death<a class="headerlink" href="#auto-kill-on-death" title="Permanent link">¶</a></h3>
<p>The processes you start in your python program, stay running even after your program exits. This is
<em>usually</em> what you want, but when you want all your sub processes killed automatically on exit with
<code>Ctrl+C</code> or the like, you have to use the <a href="http://docs.python.org/library/atexit.html" rel="noopener noreferrer" target="_blank"><code>atexit</code></a> module.</p>
<div class="hl"><pre class=content><code><span><span class="n">procs</span> <span class="o">=</span> <span class="p">[]</span>
</span><span>
</span><span><span class="nd">@atexit</span><span class="o">.</span><span class="n">register</span>
</span><span><span class="k">def</span> <span class="nf">kill_subprocesses</span><span class="p">():</span>
</span><span> <span class="k">for</span> <span class="n">proc</span> <span class="ow">in</span> <span class="n">procs</span><span class="p">:</span>
</span><span> <span class="n">proc</span><span class="o">.</span><span class="n">kill</span><span class="p">()</span>
</span></code></pre></div>
<p>And add all the <code>Popen</code> objects created to the <code>procs</code> list. This is the only solution I found that
works best.</p>
<h2 id="launch-commands-in-a-terminal-emulator">Launch commands in a terminal emulator<a class="headerlink" href="#launch-commands-in-a-terminal-emulator" title="Permanent link">¶</a></h2>
<p>On one occasion, I had to write a script that would launch multiple svn checkouts and then run many
ant builds (~20-35) on the checked out projects. In my opinion, the best and easiest way to do this
is to fire up multiple terminal emulator windows each running an individual checkout/ant-build. This
allows us to monitor each process and even cancel any of them by simply closing the corresponding
terminal emulator window.</p>
<h3 id="linux">Linux<a class="headerlink" href="#linux" title="Permanent link">¶</a></h3>
<p>This is pretty trivial actually. On Linux, you can use <code>xterm</code> for this.</p>
<div class="hl"><pre class=content><code><span><span class="n">Popen</span><span class="p">([</span><span class="s1">'xterm'</span><span class="p">,</span> <span class="s1">'-e'</span><span class="p">,</span> <span class="s1">'sleep 3s'</span><span class="p">])</span>
</span></code></pre></div>
<h3 id="windows">Windows<a class="headerlink" href="#windows" title="Permanent link">¶</a></h3>
<p>On windows, its not as straight forward. The first solution for this would be</p>
<div class="hl"><pre class=content><code><span><span class="n">Popen</span><span class="p">([</span><span class="s1">'cmd'</span><span class="p">,</span> <span class="s1">'/K'</span><span class="p">,</span> <span class="s1">'command'</span><span class="p">])</span>
</span></code></pre></div>
<blockquote>
<p><code>/K</code> option tells <code>cmd</code> to run the command and keep the command window from closing. You may use
<code>/C</code> instead to close the command window after the command finishes.</p>
</blockquote>
<p>As simple as it looks, it has some weird behavior. I don’t completely understand it, but I’ll try to
explain what I have. When you try to run a python script with the above <code>Popen</code> call, in a command
window like this</p>
<div class="hl"><pre class=content><code><span>python<span class="w"> </span>main.py
</span></code></pre></div>
<p>you <em>don’t</em> see a new command window pop up. Instead, the sub command runs in the same command
window. I have no idea what happens when you run multiple sub commands this way. (I have only
limited access to windows).</p>
<p>If instead you run it in something like an IDE or IDLE (<kbd>F5</kbd>), you have a new command
window open up. I believe one each for each command you run this way. Just the way you expect.</p>
<p>But I gave up on <code>cmd.exe</code> for this purpose and learnt to use the <a href="https://code.google.com/p/mintty/" rel="noopener noreferrer" target="_blank"><code>mintty</code></a> utility that
comes with <a href="http://www.cygwin.com/" rel="noopener noreferrer" target="_blank">cygwin</a> (I think 1.7+). <code>mintty</code> is awesome. Really. Its been a while since I felt
that way about a command line utility on windows.</p>
<div class="hl"><pre class=content><code><span><span class="n">Popen</span><span class="p">([</span><span class="s1">'mintty'</span><span class="p">,</span> <span class="s1">'--hold'</span><span class="p">,</span> <span class="s1">'error'</span><span class="p">,</span> <span class="s1">'--exec'</span><span class="p">,</span> <span class="s1">'command'</span><span class="p">])</span>
</span></code></pre></div>
<p>This. A new <code>mintty</code> console window opens up running the command and it closes automatically, <em>if</em>
the command exits with zero status (that’s what <code>--hold error</code> does). Otherwise, it stays on. Very
useful.</p>
<h2 id="conclusion">Conclusion<a class="headerlink" href="#conclusion" title="Permanent link">¶</a></h2>
<p>The subprocess module is a very useful thing. Spend some time understanding it better. This is my
attempt at helping people with it, and turned out to be way longer than I’d expected. If there are
any inaccuracies in this, or if you have anything to add, please leave a comment.</p>Serializing python-requests' Session objects for fun and profit2012-02-18T00:00:00+05:302012-02-18T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2012-02-18:/posts/serializing-python-requests-session-objects-for-fun-and-profit/<h2 id="prepare">Prepare<a class="headerlink" href="#prepare" title="Permanent link">¶</a></h2>
<p>If you haven’t checked out @kennethreitz’s excellent <a href="http://docs.python-requests.org/en/latest/index.html" rel="noopener noreferrer" target="_blank">python-requests</a>
library yet, I suggest you go do that immediately. Go on, I’ll wait for you.</p>
<p>Had your candy? That is one of the most beatiful piece of python code I’ve read.
And its an excellent library with …</p><h2 id="prepare">Prepare<a class="headerlink" href="#prepare" title="Permanent link">¶</a></h2>
<p>If you haven’t checked out @kennethreitz’s excellent <a href="http://docs.python-requests.org/en/latest/index.html" rel="noopener noreferrer" target="_blank">python-requests</a>
library yet, I suggest you go do that immediately. Go on, I’ll wait for you.</p>
<p>Had your candy? That is one of the most beatiful piece of python code I’ve read.
And its an excellent library with a very humane API.</p>
<p>Recently, I have been using this library for a few of my company’s internal
projects and at a point I needed to serialize and save <code>Session</code> objects for
later. That wasn’t as straightforward as I first thought it’d be, so I am
sharing my experience here.</p>
<p>First off, let’s make a simple http server which we are going to contact with
python-requests. The server should be able to handle cookie based sessions and
also have basic auth, as these things are handled by python-requests’ Session
objects on the client side. I won’t discuss the code for the server here, you
can get it from <a href="https://gist.github.com/2660997#file_server.py" rel="noopener noreferrer" target="_blank">the gist</a>.</p>
<p>Once you have the server running, now for the client, lets do requests!</p>
<div class="hl"><pre class=content><code><span><span class="kn">import</span> <span class="nn">requests</span> <span class="k">as</span> <span class="nn">req</span>
</span><span>
</span><span><span class="n">URL_ROOT</span> <span class="o">=</span> <span class="s1">'http://localhost:5050'</span>
</span><span>
</span><span><span class="k">def</span> <span class="nf">get_logged_in_session</span><span class="p">(</span><span class="n">name</span><span class="p">):</span>
</span><span> <span class="n">session</span> <span class="o">=</span> <span class="n">req</span><span class="o">.</span><span class="n">session</span><span class="p">(</span><span class="n">auth</span><span class="o">=</span><span class="p">(</span><span class="s1">'user'</span><span class="p">,</span> <span class="s1">'pass'</span><span class="p">))</span>
</span><span>
</span><span> <span class="n">login_response</span> <span class="o">=</span> <span class="n">session</span><span class="o">.</span><span class="n">post</span><span class="p">(</span><span class="n">URL_ROOT</span> <span class="o">+</span> <span class="s1">'/login'</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="p">{</span><span class="s1">'name'</span><span class="p">:</span> <span class="n">name</span><span class="p">})</span>
</span><span> <span class="n">login_response</span><span class="o">.</span><span class="n">raise_for_status</span><span class="p">()</span>
</span><span>
</span><span> <span class="k">return</span> <span class="n">session</span>
</span><span>
</span><span><span class="k">def</span> <span class="nf">get_whoami</span><span class="p">(</span><span class="n">session</span><span class="p">):</span>
</span><span> <span class="n">response</span> <span class="o">=</span> <span class="n">session</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">URL_ROOT</span> <span class="o">+</span> <span class="s1">'/whoami'</span><span class="p">)</span>
</span><span> <span class="n">response</span><span class="o">.</span><span class="n">raise_for_status</span><span class="p">()</span>
</span><span> <span class="k">return</span> <span class="n">response</span><span class="o">.</span><span class="n">text</span>
</span></code></pre></div>
<p>I defined two functions here. The <code>get_logged_in_session</code> will create a new
session and login to the http server and return that session. Any subsequent
requests using this sesssion will be made as if you have logged in. That’s what
will be tested with the <code>get_whoami</code> function, which will just return the
response from <code>/whoami</code>.</p>
<p>Lets test this out. Make sure the <code>server.py</code> is running and in another
terminal,</p>
<div class="hl"><pre class=content><code><span><span class="err">$</span> <span class="n">python</span> <span class="o">-</span><span class="n">i</span> <span class="n">client</span><span class="o">.</span><span class="n">py</span>
</span><span><span class="o">>>></span> <span class="n">s</span> <span class="o">=</span> <span class="n">get_logged_in_session</span><span class="p">(</span><span class="s1">'sharat'</span><span class="p">)</span>
</span><span><span class="o">>>></span> <span class="n">get_whoami</span><span class="p">(</span><span class="n">s</span><span class="p">)</span>
</span><span><span class="sa">u</span><span class="s1">'You are sharat'</span>
</span><span><span class="o">>>></span> <span class="n">get_whoami</span><span class="p">(</span><span class="n">req</span><span class="o">.</span><span class="n">session</span><span class="p">(</span><span class="n">auth</span><span class="o">=</span><span class="p">(</span><span class="s1">'user'</span><span class="p">,</span> <span class="s1">'pass'</span><span class="p">)))</span>
</span><span><span class="sa">u</span><span class="s1">'You are a guest'</span>
</span></code></pre></div>
<p>Works perfectly. If we pass it the logged in session, it gives us the username
and if we pass it a new session, it gives us <code>a guest</code>.</p>
<p>Now, lets assume we have two functions, <code>serialize_session</code> and
<code>deserialize_session</code> which do exactly what their names say. We can test them
out by running a small test.py, as</p>
<div class="hl"><pre class=content><code><span><span class="kn">from</span> <span class="nn">client</span> <span class="kn">import</span> <span class="n">get_logged_in_session</span><span class="p">,</span> <span class="n">get_whoami</span>
</span><span><span class="kn">from</span> <span class="nn">serializer</span> <span class="kn">import</span> <span class="n">deserialize_session</span><span class="p">,</span> <span class="n">serialize_session</span>
</span><span>
</span><span><span class="n">session</span> <span class="o">=</span> <span class="n">get_logged_in_session</span><span class="p">(</span><span class="s1">'sharat'</span><span class="p">)</span>
</span><span><span class="n">dsession</span> <span class="o">=</span> <span class="n">deserialize_session</span><span class="p">(</span><span class="n">serialize_session</span><span class="p">(</span><span class="n">session</span><span class="p">))</span>
</span><span>
</span><span><span class="k">assert</span> <span class="n">get_whoami</span><span class="p">(</span><span class="n">session</span><span class="p">)</span> <span class="o">==</span> <span class="n">get_whoami</span><span class="p">(</span><span class="n">dsession</span><span class="p">)</span>
</span><span><span class="nb">print</span> <span class="s1">'Success'</span>
</span></code></pre></div>
<p>and a dummy serializer.py</p>
<div class="hl"><pre class=content><code><span><span class="k">def</span> <span class="nf">serialize_session</span><span class="p">(</span><span class="n">session</span><span class="p">):</span>
</span><span> <span class="k">return</span> <span class="n">session</span>
</span><span>
</span><span><span class="k">def</span> <span class="nf">deserialize_session</span><span class="p">(</span><span class="n">session</span><span class="p">):</span>
</span><span> <span class="k">return</span> <span class="n">session</span>
</span></code></pre></div>
<p>And with that, of course, the test will not fail</p>
<div class="hl"><pre class=content><code><span>$ python test.py
</span><span>Success
</span></code></pre></div>
<h2 id="serializing">Serializing<a class="headerlink" href="#serializing" title="Permanent link">¶</a></h2>
<p>Now, to implement the functions in <code>serializer.py</code>. A simple one, would be to
use pickle. Lets try</p>
<div class="hl"><pre class=content><code><span><span class="kn">import</span> <span class="nn">pickle</span> <span class="k">as</span> <span class="nn">pk</span>
</span><span>
</span><span><span class="k">def</span> <span class="nf">serialize_session</span><span class="p">(</span><span class="n">session</span><span class="p">):</span>
</span><span> <span class="k">return</span> <span class="n">pk</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">session</span><span class="p">)</span>
</span><span>
</span><span><span class="k">def</span> <span class="nf">deserialize_session</span><span class="p">(</span><span class="n">data</span><span class="p">):</span>
</span><span> <span class="k">return</span> <span class="n">pk</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
</span></code></pre></div>
<p>If you run <code>test.py</code> now, python is going to yell at you.</p>
<div class="hl"><pre class=content><code><span>$ python test.py
</span><span>Traceback (most recent call last):
</span><span> File "test.py", line 10, in <module>
</span><span> dsession = deserialize_session(serialize_session(session))
</span><span>[ ... ]
</span><span> raise TypeError, "can't pickle %s objects" % base.__name__
</span><span>TypeError: can't pickle lock objects
</span></code></pre></div>
<p>Oh well, it was worth a try I suppose.</p>
<p><strong>Update</strong>: The Session class can be made to
<a href="#update-pickling-can-also-work">implement</a> the pickle protocol if you want to
use pickle.</p>
<p>Next plan I had was to pick up attributes and data from a <code>Session</code> object, just
enough to recreate this object using the Session constructor, and serialize
those attributes as a JSON. After all, the Session’s API is very easy to use,
how hard can picking attributes from it be? :)</p>
<p>So, I dug in the <a href="https://github.com/kennethreitz/requests/blob/develop/requests/sessions.py#L50" rel="noopener noreferrer" target="_blank">sessions.py</a> module of python-requests library. And here’s
what the signature of the constructor for <code>Session</code> objects looks like</p>
<div class="hl"><pre class=content><code><span><span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span>
</span><span> <span class="n">headers</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
</span><span> <span class="n">cookies</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
</span><span> <span class="n">auth</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
</span><span> <span class="n">timeout</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
</span><span> <span class="n">proxies</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
</span><span> <span class="n">hooks</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
</span><span> <span class="n">params</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
</span><span> <span class="n">config</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
</span><span> <span class="n">verify</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
</span><span> <span class="c1"># ...</span>
</span></code></pre></div>
<p>So, if I pick up just these values, I should be able to recreate the session
object. Sweet.</p>
<div class="hl"><pre class=content><code><span><span class="kn">import</span> <span class="nn">json</span>
</span><span><span class="kn">import</span> <span class="nn">requests</span> <span class="k">as</span> <span class="nn">req</span>
</span><span>
</span><span><span class="k">def</span> <span class="nf">serialize_session</span><span class="p">(</span><span class="n">session</span><span class="p">):</span>
</span><span> <span class="n">attrs</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'headers'</span><span class="p">,</span> <span class="s1">'cookies'</span><span class="p">,</span> <span class="s1">'auth'</span><span class="p">,</span> <span class="s1">'timeout'</span><span class="p">,</span> <span class="s1">'proxies'</span><span class="p">,</span> <span class="s1">'hooks'</span><span class="p">,</span>
</span><span> <span class="s1">'params'</span><span class="p">,</span> <span class="s1">'config'</span><span class="p">,</span> <span class="s1">'verify'</span><span class="p">]</span>
</span><span>
</span><span> <span class="n">session_data</span> <span class="o">=</span> <span class="p">{}</span>
</span><span>
</span><span> <span class="k">for</span> <span class="n">attr</span> <span class="ow">in</span> <span class="n">attrs</span><span class="p">:</span>
</span><span> <span class="n">session_data</span><span class="p">[</span><span class="n">attr</span><span class="p">]</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="n">session</span><span class="p">,</span> <span class="n">attr</span><span class="p">)</span>
</span><span>
</span><span> <span class="k">return</span> <span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">session_data</span><span class="p">)</span>
</span><span>
</span><span><span class="k">def</span> <span class="nf">deserialize_session</span><span class="p">(</span><span class="n">data</span><span class="p">):</span>
</span><span> <span class="k">return</span> <span class="n">req</span><span class="o">.</span><span class="n">session</span><span class="p">(</span><span class="o">**</span><span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">data</span><span class="p">))</span>
</span></code></pre></div>
<p>And let’s try this out</p>
<div class="hl"><pre class=content><code><span>$ python test.py
</span><span>Traceback (most recent call last):
</span><span> File "test.py", line 12, in <module>
</span><span> assert get_whoami(session) == get_whoami(dsession)
</span><span>[ ... ]
</span><span>[...]requests/models.py", line 447, in send
</span><span> r = self.auth(self)
</span><span>TypeError: 'list' object is not callable
</span></code></pre></div>
<p>Okay, that error message is very weird. Why would anyone <em>call</em> a list object?</p>
<p>Go dig in the <a href="https://github.com/kennethreitz/requests/blob/develop/requests/models.py#L447" rel="noopener noreferrer" target="_blank">models.py</a> module. See this</p>
<div class="hl"><pre class=content><code><span><span class="p">[</span> <span class="o">...</span> <span class="p">]</span>
</span><span><span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">auth</span><span class="p">,</span> <span class="nb">tuple</span><span class="p">)</span> <span class="ow">and</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">auth</span><span class="p">)</span> <span class="o">==</span> <span class="mi">2</span><span class="p">:</span>
</span><span> <span class="c1"># special-case basic HTTP auth</span>
</span><span> <span class="bp">self</span><span class="o">.</span><span class="n">auth</span> <span class="o">=</span> <span class="n">HTTPBasicAuth</span><span class="p">(</span><span class="o">*</span><span class="bp">self</span><span class="o">.</span><span class="n">auth</span><span class="p">)</span>
</span><span>
</span><span><span class="c1"># Allow auth to make its changes.</span>
</span><span><span class="n">r</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">auth</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span>
</span><span><span class="p">[</span> <span class="o">...</span> <span class="p">]</span>
</span></code></pre></div>
<p>There. Its not a list that’s being called. Not directly at least. The problem
here is that the <code>auth</code> we are passing to <code>session()</code> is not a tuple. Duh!
While I like it that <code>auth</code> is restricted to be a tuple, I wish there was a
better error message for when <code>auth</code> is a list instead of a tuple. I personally
wouldn’t want it to accept a <code>list</code> for <code>auth</code> though.</p>
<p>So, what went wrong? <code>json</code> does not differentiate between a tuple and a list.
It only does lists. So, when serializing and deserializing, the <code>auth</code> tuple is
turned to a <code>list</code>. Lets turn it back</p>
<div class="hl"><pre class=content><code><span><span class="k">def</span> <span class="nf">deserialize_session</span><span class="p">(</span><span class="n">data</span><span class="p">):</span>
</span><span> <span class="n">session_data</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
</span><span>
</span><span> <span class="k">if</span> <span class="s1">'auth'</span> <span class="ow">in</span> <span class="n">session_data</span><span class="p">:</span>
</span><span> <span class="n">session_data</span><span class="p">[</span><span class="s1">'auth'</span><span class="p">]</span> <span class="o">=</span> <span class="nb">tuple</span><span class="p">(</span><span class="n">session_data</span><span class="p">[</span><span class="s1">'auth'</span><span class="p">])</span>
</span><span>
</span><span> <span class="k">return</span> <span class="n">req</span><span class="o">.</span><span class="n">session</span><span class="p">(</span><span class="o">**</span><span class="n">session_data</span><span class="p">)</span>
</span></code></pre></div>
<p>And</p>
<div class="hl"><pre class=content><code><span>$ python test.py
</span><span>Traceback (most recent call last):
</span><span> File "test.py", line 12, in <module>
</span><span> assert get_whoami(session) == get_whoami(dsession)
</span><span> [ ... ]
</span><span> File "/usr/lib/python2.7/string.py", line 493, in translate
</span><span> return s.translate(table, deletions)
</span><span>TypeError: translate() takes exactly one argument (2 given)
</span></code></pre></div>
<p>Wait. What? Now we have an error from stdlib? This just keeps getting better and
better. If this looks like something that can frustrate you, go get some coffee
:)</p>
<p>If you look at the complete stack trace, the second file from bottom,</p>
<div class="hl"><pre class=content><code><span> <span class="n">File</span> <span class="s2">"[...]site-packages/requests/packages/oreos/monkeys.py"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">470</span><span class="p">,</span> <span class="ow">in</span> <span class="nb">set</span>
</span><span> <span class="k">if</span> <span class="s2">""</span> <span class="o">!=</span> <span class="n">translate</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">idmap</span><span class="p">,</span> <span class="n">LegalChars</span><span class="p">):</span>
</span></code></pre></div>
<p>This thing seems to be calling the <code>translate</code> method incorrectly. With a bit of
debugging and yelling at my monitor, I found out the problem and for a moment,
lost my grip on reality.</p>
<p><code>str.translate</code> takes 2 arguments, but <code>unicode.translate</code> takes only 1. I have
no idea why this is done this way but I sure as hell didn’t enjoy it. The code
in <code>oreos/monkeys.py</code> assumes that the <code>key</code> is a <code>str</code>. However, what
<code>json.loads</code> gives you, is unicode stuff. So, we need to convert just the parts
in the deserialized dict we get from <code>json.loads</code> which are being used by the
<code>oreos/monkeys.py</code>, from <code>unicode</code> to <code>str</code>.</p>
<p>Reading a bit more code around the oreos library, it didn’t take long to figure
out that those were the keys in the <code>cookies</code> dict. Lo</p>
<div class="hl"><pre class=content><code><span><span class="k">def</span> <span class="nf">deserialize_session</span><span class="p">(</span><span class="n">data</span><span class="p">):</span>
</span><span> <span class="n">session_data</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
</span><span>
</span><span> <span class="k">if</span> <span class="s1">'auth'</span> <span class="ow">in</span> <span class="n">session_data</span><span class="p">:</span>
</span><span> <span class="n">session_data</span><span class="p">[</span><span class="s1">'auth'</span><span class="p">]</span> <span class="o">=</span> <span class="nb">tuple</span><span class="p">(</span><span class="n">session_data</span><span class="p">[</span><span class="s1">'auth'</span><span class="p">])</span>
</span><span>
</span><span> <span class="k">if</span> <span class="s1">'cookies'</span> <span class="ow">in</span> <span class="n">session_data</span><span class="p">:</span>
</span><span> <span class="n">session_data</span><span class="p">[</span><span class="s1">'cookies'</span><span class="p">]</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">((</span><span class="n">key</span><span class="o">.</span><span class="n">encode</span><span class="p">(),</span> <span class="n">val</span><span class="p">)</span> <span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">val</span> <span class="ow">in</span>
</span><span> <span class="n">session_data</span><span class="p">[</span><span class="s1">'cookies'</span><span class="p">]</span><span class="o">.</span><span class="n">items</span><span class="p">())</span>
</span><span>
</span><span> <span class="k">return</span> <span class="n">req</span><span class="o">.</span><span class="n">session</span><span class="p">(</span><span class="o">**</span><span class="n">session_data</span><span class="p">)</span>
</span></code></pre></div>
<p>And so</p>
<div class="hl"><pre class=content><code><span>$ python test.py
</span><span>Success
</span></code></pre></div>
<p><strong>!</strong></p>
<p>All the code is on a <a href="https://gist.github.com/2660997" rel="noopener noreferrer" target="_blank">gist</a>.</p>
<h2 id="update-pickling-can-also-work">Update: Pickling can also work<a class="headerlink" href="#update-pickling-can-also-work" title="Permanent link">¶</a></h2>
<p>As <em>Daslch</em> pointed out in his <a href="http://www.reddit.com/r/Python/comments/pv1lf/serializing_pythonrequests_session_objects_for/c3sh5bb" rel="noopener noreferrer" target="_blank">comment</a> on reddit, by implementing the pickle
protocol on the Session class, we can get pickling to work. From the
<a href="http://docs.python.org/library/pickle.html#object.__getstate__" rel="noopener noreferrer" target="_blank">documentation</a>, we need two methods, <code>__getstate__</code> and <code>__setstate__</code>.</p>
<p>Adding those methods as follows to <code>sessions.Session</code> class</p>
<div class="hl"><pre class=content><code><span><span class="k">def</span> <span class="nf">__getstate__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span><span> <span class="n">attrs</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'headers'</span><span class="p">,</span> <span class="s1">'cookies'</span><span class="p">,</span> <span class="s1">'auth'</span><span class="p">,</span> <span class="s1">'timeout'</span><span class="p">,</span> <span class="s1">'proxies'</span><span class="p">,</span> <span class="s1">'hooks'</span><span class="p">,</span>
</span><span> <span class="s1">'params'</span><span class="p">,</span> <span class="s1">'config'</span><span class="p">,</span> <span class="s1">'verify'</span><span class="p">]</span>
</span><span> <span class="k">return</span> <span class="nb">dict</span><span class="p">((</span><span class="n">attr</span><span class="p">,</span> <span class="nb">getattr</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">attr</span><span class="p">))</span> <span class="k">for</span> <span class="n">attr</span> <span class="ow">in</span> <span class="n">attrs</span><span class="p">)</span>
</span><span>
</span><span><span class="k">def</span> <span class="nf">__setstate__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">state</span><span class="p">):</span>
</span><span> <span class="k">for</span> <span class="n">name</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">state</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
</span><span> <span class="nb">setattr</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
</span><span>
</span><span> <span class="bp">self</span><span class="o">.</span><span class="n">poolmanager</span> <span class="o">=</span> <span class="n">PoolManager</span><span class="p">(</span>
</span><span> <span class="n">num_pools</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">config</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'pool_connections'</span><span class="p">),</span>
</span><span> <span class="n">maxsize</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">config</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'pool_maxsize'</span><span class="p">)</span>
</span><span> <span class="p">)</span>
</span></code></pre></div>
<p>with this as the version of <code>serializer.py</code> that uses pickle, we do get a
<code>Success</code>.</p>
<p>The creation of new poolmanager in <code>__setstate__</code> is a piece of code copied from
<code>__init__</code> of the same class. This should probably be turned to a method to
avoid code repetition.</p>
<p><strong>Update 2</strong>: Created an <a href="https://github.com/kennethreitz/requests/issues/439" rel="noopener noreferrer" target="_blank">issue</a> about this.</p>
<p><strong>Update 3</strong>: This has been merged and Session objects are pickleable as of
version 0.10.3. See <a href="https://github.com/kennethreitz/requests/blob/develop/HISTORY.rst" rel="noopener noreferrer" target="_blank">requests history</a>.</p>Dependency graph of all installed gems2011-09-30T00:00:00+05:302011-09-30T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2011-09-30:/posts/dependency-graph-of-all-installed-gems/<p>Every other application written using ruby these days seem to come with this
installation instruction:</p>
<div class="hl"><pre class=content><code><span>gem install my-super-awesome-app
</span></code></pre></div>
<p>and then going on to describing how awesome the app is. But, installing the app
in the above way installs all its bazillion dependencies, which, unfortunately
are not uninstalled when you uninstall …</p><p>Every other application written using ruby these days seem to come with this
installation instruction:</p>
<div class="hl"><pre class=content><code><span>gem install my-super-awesome-app
</span></code></pre></div>
<p>and then going on to describing how awesome the app is. But, installing the app
in the above way installs all its bazillion dependencies, which, unfortunately
are not uninstalled when you uninstall this app with</p>
<div class="hl"><pre class=content><code><span>gem uninstall the-same-damn-app
</span></code></pre></div>
<p>And so, you have huge mess of gems installed which you have no idea why they
are there in the first place. Finding out stale gems that are left out because
of this can be a pain.</p>
<p>So, I decided a neat flowchart visualising the dependency relationships between
all the installed jars would give me a picture. And yes, it did.</p>
<p class="img"><a href="https://sharats.me/static/gem-dependency-graph.png"><img alt="Gem dependency graph" src="https://sharats.me/static/gem-dependency-graph.png"></a></p>
<p>Here’s how I got the flowchart: (save this in say, gem-graph.sh)</p>
<div class="hl"><pre class=content><code><span><span class="ch">#!/bin/bash</span>
</span><span>
</span><span>gem<span class="w"> </span>list<span class="w"> </span><span class="se">\</span>
</span><span><span class="w"> </span><span class="p">|</span><span class="w"> </span>cut<span class="w"> </span>-d<span class="se">\ </span><span class="w"> </span>-f1<span class="w"> </span><span class="se">\</span>
</span><span><span class="w"> </span><span class="p">|</span><span class="w"> </span>xargs<span class="w"> </span>gem<span class="w"> </span>dep<span class="w"> </span><span class="se">\</span>
</span><span><span class="w"> </span><span class="p">|</span><span class="w"> </span>awk<span class="w"> </span><span class="s1">'\</span>
</span><span><span class="s1"> BEGIN { print "digraph gems {" } \</span>
</span><span><span class="s1"> /^Gem / { cur=$2; sub(/-[0-9\.]+$/, "", cur); print " \"" cur "\";" } \</span>
</span><span><span class="s1"> ! /^Gem / && $0 != "" { print " \"" cur "\" -> \"" $1 "\";" } \</span>
</span><span><span class="s1"> END { print "}" }'</span><span class="w"> </span><span class="se">\</span>
</span><span><span class="w"> </span><span class="p">|</span><span class="w"> </span>dot<span class="w"> </span>-Tpng<span class="w"> </span>-o<span class="w"> </span>gems.png
</span></code></pre></div>
<p>Assuming you have <a href="http://www.graphviz.org/" rel="noopener noreferrer" target="_blank">GraphViz</a> installed, you can just
do</p>
<div class="hl"><pre class=content><code><span>chmod +x gem-graph.sh
</span><span>./gem-graph.sh
</span></code></pre></div>
<p>and the graph will be saved in gems.png. Happy gem cleaning :).</p>Implementing an expressive search system with clojure2011-09-28T00:00:00+05:302011-09-28T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2011-09-28:/posts/implementing-an-expressive-search-system-with-clojure/<h2 id="backstory">Backstory<a class="headerlink" href="#backstory" title="Permanent link">¶</a></h2>
<p>I have recently learned <a href="http://clojure.org" rel="noopener noreferrer" target="_blank">Clojure</a> and its the first time I’ve been exposed to
lisp and the code-as-data way of life. I was eager to use Clojure to make an
app, any app, a simple silly personal tool to help me out with a tedious task.</p>
<p>One such …</p><h2 id="backstory">Backstory<a class="headerlink" href="#backstory" title="Permanent link">¶</a></h2>
<p>I have recently learned <a href="http://clojure.org" rel="noopener noreferrer" target="_blank">Clojure</a> and its the first time I’ve been exposed to
lisp and the code-as-data way of life. I was eager to use Clojure to make an
app, any app, a simple silly personal tool to help me out with a tedious task.</p>
<p>One such tool I created was <a href="http://classypants.sharats.me" rel="noopener noreferrer" target="_blank">classypants</a>. Its a small swing based GUI tool
that helps one to make sense out of the values of <code>PATH</code> like variables. The
values of these variables are a list of paths of files/directories joined with
<code>:</code> in *nix systems and <code>;</code> on windows. Have you ever seen <code>CLASSPATH</code> entries
that have ~100 jars/directories in it? Even if these values have just 20 items,
its very hard to make any sense out of it.</p>
<p>Classypants is basically a pretty bare window carrying only 4 top level
controls, one of which is an input box for searching through the entries. That
search is what I want to talk about in this post.</p>
<h2 id="superpowers">Superpowers<a class="headerlink" href="#superpowers" title="Permanent link">¶</a></h2>
<p>Initially, the search box was just a filter box. I type some text and the
entries that contain that text and shown, rest hidden. This quickly became
annoying as I wanted to search for entries with <code>jaxb</code> and <code>jar</code>, which was not
possible with the then implementation.</p>
<p>The implementation of the search I have today can do much more than even that.
Its a powerful query language at work, using which we can filter entries that
point to non-existing files, entries that point to directories that contain a
said file and other weirdos.</p>
<h2 id="how-is-it-done">How is it done?<a class="headerlink" href="#how-is-it-done" title="Permanent link">¶</a></h2>
<p>I want to share how I went about evolving the search functionality. Let’s talk
about one function here,</p>
<div class="hl"><pre class=content><code><span><span class="p">(</span><span class="kd">defn </span><span class="nv">matches?</span>
</span><span><span class="w"> </span><span class="p">[</span><span class="nv">search-str</span><span class="w"> </span><span class="nv">entry</span><span class="p">]</span>
</span><span><span class="w"> </span><span class="p">(</span><span class="nb">-> </span><span class="nv">resource</span>
</span><span><span class="w"> </span><span class="p">(</span><span class="nf">.indexOf</span><span class="w"> </span><span class="nv">search-str</span><span class="p">)</span>
</span><span><span class="w"> </span><span class="p">(</span><span class="nb">not= </span><span class="mi">-1</span><span class="p">)))</span>
</span></code></pre></div>
<p>This is the first incarnation of the search implementation. It just checks if
the given <code>search-str</code> is present inside the <code>entry</code>.</p>
<p>That is nice and useful. But we want more power. We want a nice minimal query
language to describe what we want to find, and it should be easy to remember.
Lets work on negation of search results first, thinking up the simplest of
syntaxes,</p>
<div class="hl"><pre class=content><code><span><span class="nb">not </span><span class="nv">resource</span>
</span></code></pre></div>
<p>should match entries that do <em>not</em> contain <code>resource</code>. This doesn’t look good,
as it might also mean to search for entries that contain <code>not</code> or <code>resource</code>. We
need some sugar to identify the <code>not</code> part as a directive that modifies how the
search is done. Lets try again,</p>
<div class="hl"><pre class=content><code><span><span class="ss">:not</span><span class="w"> </span><span class="nv">resource</span>
</span></code></pre></div>
<p>Ah, the <code>:</code> in from of <code>not</code> gives it the special behaviour we need. Don’t worry
too much about why the syntax isn’t <code>not: resource</code> or something else, it will
become clear in a moment, if it hasn’t already. Now that we have a search
syntax, its time to get it work. Imagine a function, <code>digest</code>, which takes a
search string and returns a <em>function</em>, which takes an entry and tells if its a
match or not. I suck at writing, read that again.</p>
<p>Essentially, <code>(digest ":not resource")</code> should return a function, which more or
less works like</p>
<div class="hl"><pre class=content><code><span><span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">entry</span><span class="p">]</span>
</span><span><span class="w"> </span><span class="p">(</span><span class="nb">not </span><span class="p">(</span><span class="nf">matches?</span><span class="w"> </span><span class="s">"resource"</span><span class="w"> </span><span class="nv">entry</span><span class="p">)))</span>
</span></code></pre></div>
<p>We see if there is a match, and <code>not</code> its result. Lets try writing the <code>digest</code>
function,</p>
<div class="hl"><pre class=content><code><span><span class="p">(</span><span class="kd">defn </span><span class="nv">digest</span>
</span><span><span class="w"> </span><span class="p">[</span><span class="nv">search-str</span><span class="p">]</span>
</span><span><span class="w"> </span><span class="p">(</span><span class="nf">read-string</span><span class="w"> </span><span class="p">(</span><span class="nb">str </span><span class="s">"("</span><span class="w"> </span><span class="nv">search-str</span><span class="w"> </span><span class="s">")"</span><span class="p">)))</span>
</span></code></pre></div>
<p>What we do above is wrap the <code>search-str</code> in parenthesis and read it into a
Clojure <code>list</code>. Lets try out our function in the REPL.</p>
<div class="hl"><pre class=content><code><span><span class="nv">user=></span><span class="w"> </span><span class="p">(</span><span class="nf">digest</span><span class="w"> </span><span class="s">":not resource"</span><span class="p">)</span>
</span><span><span class="p">(</span><span class="ss">:not</span><span class="w"> </span><span class="nv">resource</span><span class="p">)</span>
</span></code></pre></div>
<p>Yep, just what we expected. Now, lets take this further ahead</p>
<div class="hl"><pre class=content><code><span><span class="p">(</span><span class="kd">defn </span><span class="nv">digest</span>
</span><span><span class="w"> </span><span class="p">[</span><span class="nv">search-str</span><span class="p">]</span>
</span><span><span class="w"> </span><span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">spec</span><span class="w"> </span><span class="p">(</span><span class="nf">read-string</span><span class="w"> </span><span class="p">(</span><span class="nb">str </span><span class="s">"("</span><span class="w"> </span><span class="nv">search-str</span><span class="w"> </span><span class="s">")"</span><span class="p">))]</span>
</span><span><span class="w"> </span><span class="p">(</span><span class="nf">cond</span>
</span><span><span class="w"> </span><span class="p">(</span><span class="nb">= </span><span class="p">(</span><span class="nb">first </span><span class="nv">spec</span><span class="p">)</span><span class="w"> </span><span class="ss">:not</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">e</span><span class="p">]</span>
</span><span><span class="w"> </span><span class="p">(</span><span class="nb">not </span><span class="p">(</span><span class="nf">matches?</span><span class="w"> </span><span class="p">(</span><span class="nb">nth </span><span class="nv">spec</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="nv">e</span><span class="p">))))))</span>
</span></code></pre></div>Vim undo breaks with auto-close plugins2011-09-28T00:00:00+05:302011-09-28T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2011-09-28:/posts/vim-undo-breaks-with-auto-close-plugins/<h2 id="prelude">Prelude<a class="headerlink" href="#prelude" title="Permanent link">¶</a></h2>
<p>If you’ve used IDEs or other heavy editors ever in your life, you’d know how
nice it is to have parentheses and brackets to get auto-closed. If you don’t
know what I’m talking about, its a feature usually present in IDEs like eclipse
and easily …</p><h2 id="prelude">Prelude<a class="headerlink" href="#prelude" title="Permanent link">¶</a></h2>
<p>If you’ve used IDEs or other heavy editors ever in your life, you’d know how
nice it is to have parentheses and brackets to get auto-closed. If you don’t
know what I’m talking about, its a feature usually present in IDEs like eclipse
and easily recreated in vim with mappings like</p>
<div class="hl"><pre class=content><code><span><span class="nb">inoremap</span> <span class="p">(</span> <span class="p">()<</span>Left<span class="p">></span>
</span></code></pre></div>
<p>Of course, that’s just a simple taste. There are vastly complicated plugins that
achieve this.</p>
<p>Now, what’s really super annoying about these plugins is that they tend to
break vim’s amazingly powerful undo functionality. In other words, if you are
using an auto-close plugin, chances are, you can’t rely on vim’s undo anymore.</p>
<p>Debugging this and finding the cause has been on my todo list for quite some
time and a few days ago, I finally sat down to explore. I am writing my
experience here. First, a simple test case to see if the auto-close plugin you
use breaks undo, open vim (a blank file) and hit the following keys:</p>
<div class="hl"><pre class=content><code><span><span class="k">iabc</span>{<span class="p"><</span>CR<span class="p">><</span>ESC<span class="p">></span><span class="k">u</span>
</span></code></pre></div>
<p>Where instead of <code><CR></code> you’d hit the return key and instead of <code><ESC></code> you’d
hit the Escape key. Decent knowledge of vim should tell you that after the above
keys, you should end up with a blank file again. Right?</p>
<p>If instead, you see a closing brace dangling in the second line, your undo is
broken. MUHAHAHAHAHA! You can’t rely on undo anymore until you get rid of that
one plugin!</p>
<h2 id="whats-going-on">What’s going on?<a class="headerlink" href="#whats-going-on" title="Permanent link">¶</a></h2>
<p>So, experimenting with many auto-close plugins and reading the source of at
least 3 of those, I say there are basically two different implementations of
this functionality, which all these plugins use. The first one is pretty much
what was shown at the start of this article,</p>
<div class="hl"><pre class=content><code><span><span class="nb">inoremap</span> <span class="p">(</span> <span class="p">()<</span>Left<span class="p">></span>
</span><span><span class="c">" or</span>
</span><span><span class="nb">inoremap</span> <span class="p">(</span> <span class="p"><</span>C<span class="p">-</span><span class="k">r</span><span class="p">>=</span><span class="s2">"()\<Left>"</span>
</span></code></pre></div>
<p>I’m going to call this class of plugins, the critters. These do <em>not</em> break your
undo. The next class of implementations, that do break your undo, the beasts, do
a bit of dark sorcery with stuff like</p>
<div class="hl"><pre class=content><code><span><span class="nb">inoremap</span> <span class="p">(</span> <span class="p"><</span>C<span class="p">-</span><span class="k">r</span><span class="p">>=</span>MyAwesomePairInseter<span class="p">()<</span>CR<span class="p">></span>
</span></code></pre></div>
<p>There is no dark sorcery here that is immediately apparent. The real sorcery is
<em>inside</em> that function, where a call to <code>setline()</code> function is made to replace
your current line to contain the parentheses text at the cursor. Doesn’t make
sense? Don’t worry, you’ll get it soon enough.</p>
<h2 id="which-plugins-name-them">Which plugins? Name them!<a class="headerlink" href="#which-plugins-name-them" title="Permanent link">¶</a></h2>
<p>Here are a few ones that break undo:</p>
<h3 id="beasts">Beasts<a class="headerlink" href="#beasts" title="Permanent link">¶</a></h3>
<ul>
<li><a href="https://github.com/vim-scripts/AutoClose" rel="noopener noreferrer" target="_blank">https://github.com/vim-scripts/AutoClose</a></li>
<li><a href="https://github.com/Raimondi/delimitMate" rel="noopener noreferrer" target="_blank">https://github.com/Raimondi/delimitMate</a></li>
<li><a href="https://github.com/Townk/vim-autoclose" rel="noopener noreferrer" target="_blank">https://github.com/Townk/vim-autoclose</a></li>
</ul>
<p>and these don’t break undo</p>
<h3 id="critters">Critters<a class="headerlink" href="#critters" title="Permanent link">¶</a></h3>
<ul>
<li><a href="https://github.com/vim-scripts/ClosePairs" rel="noopener noreferrer" target="_blank">https://github.com/vim-scripts/ClosePairs</a></li>
<li><a href="https://github.com/vim-scripts/simple-pairs" rel="noopener noreferrer" target="_blank">https://github.com/vim-scripts/simple-pairs</a></li>
<li><a href="https://github.com/vim-scripts/Auto-Pairs" rel="noopener noreferrer" target="_blank">https://github.com/vim-scripts/Auto-Pairs</a></li>
</ul>
<p>An initial look at them and you can tell, the ones that break undo are actually
more popular and have a relatively larger code base. So why doesn’t anyone
complain about breaking undo? I think they do and I believe the root cause is
a bug with <em>vim</em> itself.</p>
<p>The main difference in usability among these classes is again to do with undo.
In the beasts, typing a brace does not start a new undo action, but it does in
the critters (like hitting a <code><C-g>u</code>). This might actually be playing a role in
why undo breaks in beasts only, but the exact reason escapes me.</p>
<h2 id="a-reproducible-test-case">A reproducible test case<a class="headerlink" href="#a-reproducible-test-case" title="Permanent link">¶</a></h2>
<p>I wanted to reproduce this problem with a vanilla vim with no custom
configuration (except for <code>nocompatible</code>). So, I checked out the latest version
(vim73-353) from the mercurial repository, compiled (with python, ruby and
usual shit) and opened it, with no plugins and a simple vimrc as the following:</p>
<div class="hl"><input type=checkbox id=co-5><label for=co-5><span class='btn show-full-code-btn'>Show remaining 6 lines</span></label><pre class=content><code><span><span class="k">set</span> <span class="nb">nocompatible</span>
</span><span>
</span><span><span class="nb">inoremap</span> <span class="p"><</span>buffer<span class="p">></span> <span class="p"><</span><span class="k">silent</span><span class="p">></span> <span class="p">(</span> <span class="p"><</span>C<span class="p">-</span>R<span class="p">>=<</span>SID<span class="p">></span>InsertPair<span class="p">(</span><span class="s2">"("</span><span class="p">,</span> <span class="s2">")"</span><span class="p">)<</span>CR<span class="p">></span>
</span><span><span class="nb">inoremap</span> <span class="p"><</span>buffer<span class="p">></span> <span class="p"><</span><span class="k">silent</span><span class="p">></span> [ <span class="p"><</span>C<span class="p">-</span>R<span class="p">>=<</span>SID<span class="p">></span>InsertPair<span class="p">(</span><span class="s2">"["</span><span class="p">,</span> <span class="s2">"]"</span><span class="p">)<</span>CR<span class="p">></span>
</span><span><span class="nb">inoremap</span> <span class="p"><</span>buffer<span class="p">></span> <span class="p"><</span><span class="k">silent</span><span class="p">></span> { <span class="p"><</span>C<span class="p">-</span>R<span class="p">>=<</span>SID<span class="p">></span>InsertPair<span class="p">(</span><span class="s2">"{"</span><span class="p">,</span> <span class="s2">"}"</span><span class="p">)<</span>CR<span class="p">></span>
</span><span>
</span><span><span class="k">function</span><span class="p">!</span> s:InsertPair<span class="p">(</span>opener<span class="p">,</span> closer<span class="p">)</span>
</span><span> <span class="k">let</span> <span class="k">l</span>:save_ve <span class="p">=</span> &<span class="k">ve</span>
</span><span> <span class="k">set</span> <span class="k">ve</span><span class="p">=</span><span class="k">all</span>
</span><span>
</span><span> <span class="k">call</span> s:InsertStringAtCursor<span class="p">(</span><span class="k">a</span>:closer<span class="p">)</span>
</span><span>
</span><span> exec <span class="s2">"set ve="</span> . <span class="k">l</span>:save_ve
</span><span> <span class="k">return</span> <span class="k">a</span>:opener
</span><span><span class="k">endfunction</span>
</span><span>
</span><span><span class="k">function</span><span class="p">!</span> s:InsertStringAtCursor<span class="p">(</span>str<span class="p">)</span>
</span><span> <span class="k">let</span> <span class="k">l</span>:line <span class="p">=</span> getline<span class="p">(</span><span class="s1">'.'</span><span class="p">)</span>
</span><span> <span class="k">let</span> <span class="k">l</span>:column <span class="p">=</span> <span class="k">col</span><span class="p">(</span><span class="s1">'.'</span><span class="p">)</span><span class="m">-2</span>
</span><span>
</span><span class=collapse> <span class="k">if</span> <span class="k">l</span>:column <span class="p"><</span> <span class="m">0</span>
</span><span class=collapse> <span class="k">call</span> setline<span class="p">(</span><span class="s1">'.'</span><span class="p">,</span> <span class="k">a</span>:str . <span class="k">l</span>:line<span class="p">)</span>
</span><span class=collapse> <span class="k">else</span>
</span><span class=collapse> <span class="k">call</span> setline<span class="p">(</span><span class="s1">'.'</span><span class="p">,</span> <span class="k">l</span>:line[:<span class="k">l</span>:column] . <span class="k">a</span>:str . <span class="k">l</span>:line[<span class="k">l</span>:column<span class="p">+</span><span class="m">1</span>:]<span class="p">)</span>
</span><span class=collapse> <span class="k">endif</span>
</span><span class=collapse><span class="k">endfunction</span>
</span></code></pre></div>
<p>Which is a stripped down version of the auto-close functionality implemented in
townk’s auto-close plugin. And opened vim</p>
<div class="hl"><pre class=content><code><span>vim<span class="w"> </span>-u<span class="w"> </span>undo-breaker-vimrc
</span></code></pre></div>
<p>and did the test here. Boom, a dangling brace character.</p>
<p>For all I know, its the call to <code>setline()</code> that’s making all the difference.
But I could be entirely wrong with that. I say this because that is the major
difference between the two classes of implementations.</p>
<h2 id="next">Next?<a class="headerlink" href="#next" title="Permanent link">¶</a></h2>
<p>I use persistent-undo in vim73 and heavily depend on it. Combined with the
<a href="http://sjl.bitbucket.org/gundo.vim" rel="noopener noreferrer" target="_blank">gundo</a> plugin by <a href="http://stevelosh.com" rel="noopener noreferrer" target="_blank">Steve Losh</a>, I get a kind of nicely visualized version
history that is centric to every file, which is quite handy in its own right.</p>
<p>So, if there are others who have faced this, have a fix for it, perhaps a patch
to vim, or if there is already a bug in vim’s bug database on this, let me know.</p>
<p>Thank you for reading.</p>Installing Crunchbang Linux on my old lappy2011-02-25T00:00:00+05:302011-02-25T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2011-02-25:/posts/installing-crunchbang-linux-on-my-old-lappy/<p>I managed to install Crunchbang linux, the recently released Stetler, after
reading quite a positive review (I don’t remember where). I am really liking
it, especially the Openbox desktop environment. Also, coming from a lot of
experience on ubuntu, finding Crunchbang look so bare-bones and simple, yet so
customizable …</p><p>I managed to install Crunchbang linux, the recently released Stetler, after
reading quite a positive review (I don’t remember where). I am really liking
it, especially the Openbox desktop environment. Also, coming from a lot of
experience on ubuntu, finding Crunchbang look so bare-bones and simple, yet so
customizable is very refreshing. I will put my experience with installing it
and my initial thoughts, before I forget them :).</p>
<p>Now my laptop’s got a defective and unreliable disk drive, so I chose to
install Crunchbang from USB with the help of unetbootin. After downloading the
<code>#!</code> (Crunchbang) ISO file, I fired up unetbootin on my windows vista (on the
same laptop) and setup my 1GB pen drive to be bootable. After that, I had to
create a couple of symlinks (using Cygwin) on the USB drive as following</p>
<div class="hl"><pre class=content><code><span>ln<span class="w"> </span>-s<span class="w"> </span>live/vmlinuz1<span class="w"> </span>vmlinuz
</span><span>ln<span class="w"> </span>-s<span class="w"> </span>live/initrd1.img<span class="w"> </span>initrd.img
</span></code></pre></div>
<p>After that, the boot was pretty smooth, and I had to choose the graphical
installer as the text based installer wouldn’t load, which I have no idea why.</p>
<p>Another interesting thing that happened was that at the end of the
installation, <code>#!</code> asked me if I wanted to install the grub boot loader, and that
it detects windows as another OS on the machine. However the grub it installed
does not list windows in the boot menu. I asked a question about this on
<a href="https://unix.stackexchange.com" rel="noopener noreferrer" target="_blank">unix.stackexchange.com</a> and got to know that a
simple <code>sudo</code> update-grub added the windows item to my boot menu. Not a major
set-back, but still.</p>
<p>After that, using the OS is nothing but a pure pleasure. It feels amazingly
snappy and super productive. The conky based hotkey reference on the desktop is
a killer thing to look for. Oh, and Dropbox installation is easier than on my
ubuntu box, if you use Dropbox that is. Chrome, my browser of choice, is the
default browser, what more can I ask? Awesome distribution. I am looking
forward to exploring even more with my shiny new #!, and I seriously recommend
you give it a try :)</p>A tasty vim configuration setup with Vimpire and Pathogen2010-12-14T00:00:00+05:302010-12-14T00:00:00+05:30Shrikant Sharat Kandulatag:sharats.me,2010-12-14:/posts/a-tasty-vim-configuration-setup-with-vimpire-and-pathogen/<p>Managing vim plugins has always been a hassle. Until pathogen came along. If
you are using vim with quite a few vim plugins, then you should be using
pathogen, if you are not, you are seriously depriving yourself of sanity. No,
seriously. You should.</p>
<p>So, I assume you are also …</p><p>Managing vim plugins has always been a hassle. Until pathogen came along. If
you are using vim with quite a few vim plugins, then you should be using
pathogen, if you are not, you are seriously depriving yourself of sanity. No,
seriously. You should.</p>
<p>So, I assume you are also versioning your <code>.vim</code> directory, like on GitHub or
BitBucket with git or mercurial respectively. If you are not, then you should.
You really really should.</p>
<p>If your answer was no to both of the above, you better get the hell out of here
before I get my lawn mowers.</p>
<p>Okay, if you tried to version your <code>.vim</code> directory, but the plugin directories
inside pathogen’s bundle directory are repositories themselves, you won’t be
very happy. You either have to version all the .git and .hg and what not
version directories from the plugins, or you just have to ignore them all and
forgo versioning for individual plugins. But if you chose the latter, in which
case versioning your <code>.vim</code> will be easy, updating your plugins is a serious
pain.</p>
<p>So, recently, <a href="http://vim-scripts.org" rel="noopener noreferrer" target="_blank">http://vim-scripts.org</a> came up and so did scripts like vundle and
vim-update-bundles, as listed on the tools page on <a href="http://vim-scripts.org" rel="noopener noreferrer" target="_blank">http://vim-scripts.org</a>.
These let you list the plugins you use in your vimrc file and they take care of
keeping them up to date. The advantage is that you can version your <code>.vim</code>
directory, and wherever you clone it, you can just run the script used and all
your plugins are set up, the latest versions of them, just like that. Awesome!</p>
<p>Vimpire isn’t much different from those tools. In fact, it is very similar to
vim-update-bundles in functionality, but there are 2 main differences. First
off, it is written in python. I won’t spell out the implications of that. But,
it is ruby-less. Second, it supports hg. Yay! So, you can get plugins not just
from git, but also from hg.</p>
<p>How to set it up and how to use it can be seen on the BitBucket page, via the
README file.</p>
<p>Hosted at <a href="http://bitbucket.org/sharat87/vimpire/src" rel="noopener noreferrer" target="_blank">http://bitbucket.org/sharat87/vimpire/src</a></p>
<p>Please note that this is still beta. Tested on windows 7. I am waiting to get
back to Ubuntu, but until then, no idea if it works on unix like machines.</p>
<p>Update: The latest version works perfectly with Ubuntu too!</p>