The Sharat's

A Tale of Two Forwarded Headers

2023-08-20T00:00:00+05:30

This is the story of how I handled troubleshooting the redirect URL for OAuth2 in Appsmith, which contained the host as localhost instead of the actual domain name when hosted on Google Cloud Run. This is a story of how Forwarded and X-Forwarded-* headers were propagating through multiple reverse proxies and how they can be confused.

The Problem¶

Appsmith is an internal tool builder that has a React-based frontend and a Java+Spring based backend server. This backend uses the spring-security module’s support for OAuth2 authentication, which enables logging in to Appsmith with Google.

Google Cloud Run is

[…] a managed compute platform that lets you run containers directly on top of Google’s scalable infrastructure.

In other words, Google Cloud Run is a serverless abstraction, to run Docker containers.

When running Appsmith on Google Cloud Run and enabling Login with Google, the redirect URL used as part of the OAuth2 flow includes the host as localhost instead of the actual domain name. This causes the OAuth2 flow to fail due to a mismatch in the redirect URL.

Primary Behaviour¶

Let’s start an Appsmith container, with Google OAuth configured, and see what redirect URL gets generated in a controlled environment.

docker run --name appsmith -p 8001:80 -v stacks:/appsmith-stacks -d \
  -e APPSMITH_OAUTH2_GOOGLE_CLIENT_ID=dummy \
  -e APPSMITH_OAUTH2_GOOGLE_CLIENT_SECRET=dummy \
  appsmith/appsmith-ce:v1.9.29

We configure Google OAuth with dummy values here, since we only care about the generated redirect URL and not the complete OAuth flow.

Let’s wait a little while for that to start and show up working on http://localhost:8001. Then, let’s initiate the OAuth2 flow and see the redirect URL.

curl -sSi http://localhost:8001/oauth2/authorization/google

This will print all the response headers. Let’s just pick the redirect_uri query parameter in the Location header (which contains the Google authorization endpoint as part of the OAuth2 flow).

curl -sSi http://localhost:8001/oauth2/authorization/google | grep -Eo 'redirect_uri=[^&]+'

We get the result as this:

redirect_uri=http://localhost/login/oauth2/code/google

Which is not entirely accurate because it’s missing the :8001 part, but that’s a problem for another day. For now, let’s just focus on the localhost part. This is the correct host here. But if we make this request with a different host:

curl -sSi http://localhost:8001/oauth2/authorization/google \
  -H 'Host: one.com' | grep -Eo 'redirect_uri=[^&]+'

Here, in the redirect_uri query parameter, we see the URL that we expect to see, with one.com as the host.

redirect_uri=http://one.com/login/oauth2/code/google

Similarly, if we try with X-Forwarded-Host header, or the more standard Forwarded header, we always see the correct host in the redirect_uri query parameter.

> curl -sSi http://localhost:8001/oauth2/authorization/google \
  -H 'X-Forwarded-Host: two.com' | grep -Eo 'redirect_uri=[^&]+'
redirect_uri=http://two.com/login/oauth2/code/google

> curl -sSi http://localhost:8001/oauth2/authorization/google \
  -H 'Forwarded: host=three.com' | grep -Eo 'redirect_uri=[^&]+'
redirect_uri=http://three.com/login/oauth2/code/google

The Appsmith backend server seems to be handling the host detection quite well, but when it’s run on Google Cloud Run, the host is always localhost.

> curl -sSi https://appsmith-abcdefghij-uc.a.run.app/oauth2/authorization/google \
  -H 'Host: four.com' | grep -Eo 'redirect_uri=[^&]+'
redirect_uri=http://localhost/login/oauth2/code/google

Cloud Run, the Reverse Proxy¶

We’ve established that if the host is shared correctly with Appsmith, it produces the correct redirect_uri. So something about the way Google Cloud Run is forwarding the host is not working as expected. We want to find out just what Cloud Run is sending across.

To get this information, let’s run an instance of httpbun on Cloud Run, which can respond with all the headers it receives.

Here’s a sample configuration of how we can run httpbun on Cloud Run.

Once this is deployed, we get a URL like https://httpbun-abcdefghij-uc.a.run.app. Let’s make a request to this and see what headers it reports as being part of the request.

> curl -sSi https://httpbun-abcdefghij-uc.a.run.app/headers
{
  "Accept": "*/*",
  "Forwarded": "for=\"1.2.3.4\";proto=https",
  "Host": "httpbun-abcdefghij-uc.a.run.app",
  "Traceparent": "00-abcdefghijklmnopqrstuvwxyzabcdef-ghijklmnopqrstuv-01",
  "User-Agent": "curl/7.88.1",
  "X-Cloud-Trace-Context": "abcdefghijklmnopqrstuvwxyzabcdef/ghijklmnopqrstuvwxy;o=1",
  "X-Forwarded-For": "1.2.3.4",
  "X-Forwarded-Proto": "https"
}

Fantastic! We see that Cloud Run sends the actual host in the Host header, instead of X-Forwarded-Host, despite sending in X-Forwarded-For and X-Forwarded-Proto. This is only slightly odd, but not groundbreaking. As we’ve seen earlier, Appsmith handles this just fine.

But in addition to that, notice that we have a Forwarded header too. This contains the same information as X-Fowarded-For and X-Forwarded-Proto, and doesn’t contain a host field.

Detour: The Forwarded header is a more standard header that holds the same (and some more) information as the X-Forwarded-* suite of headers, which is are a little less standard-ly defined. What’s peculiar here is that Cloud Run appears to be sending both Forwarded and X-Forwarded-* headers.

We didn’t test this case with our local Appsmith. That is, we didn’t send the actual host in the Host header, but also include a Forwarded header with information about the origin protocol (and IP Address). Let’s do that now.

> curl -sSi http://localhost:8001/oauth2/authorization/google \
  -H 'Host: abc.com' -H 'Forwarded: for"1.2.3.4";proto=https' | grep -Eo 'redirect_uri=[^&]+'
redirect_uri=https://localhost/login/oauth2/code/google

Boom! There it is. Although we’re sending the host in Host header, Appsmith responds with localhost in the host part of the redirect_uri. This is the same behavior we see on Cloud Run.

The Reverse Proxy Inside Appsmith Container¶

Inside the Appsmith container, we have an NGINX process that handles all incoming requests. If the request points to a static file, it is served immediately. If it points to a backend API call, NGINX will proxy the request over to the Appsmith backend server. This NGINX configuration file is generated by this script, and you can peek into the actual configuration used by running docker exec appsmith cat /etc/nginx/sites-enabled/default. For the URL we’ve been curl-ing so far, the route that matches is this:

  location /oauth2 {
    proxy_pass http://localhost:8080;
  }

Since this location block doesn’t have any proxy_set_header directives, the ones in the parent context will apply. We can see these as:

  proxy_set_header X-Forwarded-Proto $origin_scheme;
  proxy_set_header X-Forwarded-Host  $origin_host;

The $origin_scheme and $origin_host are defined at the top of the configuration file, like this:

map $http_x_forwarded_proto $origin_scheme {
  default $http_x_forwarded_proto;
  '' $scheme;
}

map $http_x_forwarded_host $origin_host {
  default $http_x_forwarded_host;
  '' $host;
}

What this is essentially doing is setting up so that if the incoming request has an X-Forwarded-Proto header, the $origin_scheme is set to that header’s value. If that header is not present in the request, $origin_scheme is set to $scheme. This is an NGINX variable set to the current request’s protocol. Similarly, $origin_host either takes the value of X-Forwarded-Host header if present, or the current request’s host (which is usually the Host header of the request).

This means that once the request goes from this NGINX to Appsmith backend server, Host becomes localhost:8080, X-Forwarded-Host is set to appsmith-abcdefghij-uc.a.run.app, and the others, X-Forwarded-Proto, X-Forwarded-For and even the Forwarded header, are passed along as is.

This is the problem.

Since the Forwarded header is the more modern standard, it’s value usually takes precedence. The fact that the request has a Forwarded header, unfortunately means that all the other X-Forwarded-* headers will be ignored by the Appsmith server.

This means the X-Forwarded-Host header is completely ignored, and the server instead looks for a host= field in the Forwarded header, which is missing, so it thinks the host it receives in the Host header, localhost:8080, is the actual host, and uses that to construct the redirect_uri.

We can simulate this theory by sending a request to the Appsmith backend server directly instead of going through the NGINX proxy. We can do this by using the docker exec command, like this:

> docker exec appsmith curl -sSi localhost:8080/oauth2/authorization/google \
  -H 'Forwarded: for="1.2.3.4";proto=https' \
  -H 'X-Forwarded-Host: abc.com' \
  | grep -Eo 'redirect_uri=[^&]+'
redirect_uri=https://localhost/login/oauth2/code/google

This produces localhost in the redirect_uri, just like we saw earlier, instead of abc.com. If we remove the Forwarded header, or add host= field in it, it works just fine.

> docker exec appsmith curl -sSi localhost:8080/oauth2/authorization/google \
  -H 'X-Forwarded-Host: abc.com' \
  | grep -Eo 'redirect_uri=[^&]+'
redirect_uri=https://abc.com/login/oauth2/code/google

> docker exec appsmith curl -sSi localhost:8080/oauth2/authorization/google \
  -H 'Forwarded: for="1.2.3.4";proto=https, host=abc.com' \
  -H 'X-Forwarded-Host: abc.com' \
  | grep -Eo 'redirect_uri=[^&]+'
redirect_uri=https://abc.com/login/oauth2/code/google

The Solution¶

In the NGINX, we add/set the X-Forwarded-Host header, at all times, which is the right thing to do. But if the incoming request has a Forwarded header, it takes precedence and the X-Forwarded-Host header is ignored. This is the problem.

So we get NGINX to also add the host= field, if a Forwarded header exists. We do this in this PR.

Essentially, define a $final_forwarded, like this:

map $http_forwarded $final_forwarded {
  default '$http_forwarded, host=$host;proto=$scheme';
  '' '';
}

In the http block, we set the Forwarded header as follows:

  proxy_set_header Forwarded $final_forwarded;

This way, if there’s no incoming Forwarded header, we don’t send it to the backend server either. But if it exists, we add the host= field (and a proto= field for good measure) to it, and send it to the backend server.

Conclusion¶

The confusion between Forwarded and X-Forwarded-* suite of headers, and which takes precedence when both are set, turned out to be the underlying problem. The NGINX we use inside Appsmith, was only ever tuned to work with X-Forwarded-* suite of headers. Additionally, since Google Cloud Run is so opaque, in the sense that we can’t even get shell access into the running container, using tools like Httpbun can be very helpful in figuring out what details the request actually contains.

Running Docker containers in network isolation with proxied traffic

2023-04-16T00:00:00+05:30

Several network configurations, especially in large companies and universities, have a proxy configured for all outgoing traffic. Any network traffic that tries to go out bypassing this proxy, will be blocked. For a self-hosted web application, the server will also need to make any and all outgoing connections via this proxy.

Now, several applications, web application servers included, support the HTTP_PROXY and HTTPS_PROXY environment variables to configure such a proxy. But if we don’t have a network that blocks non-proxy traffic, how do you do we test this? How can we ensure, that when a proxy is configured, all outgoing requests are only ever made through the proxy?

This article is my attempt at answering this.

Table of Contents

Docker Networks
Sandbox
Proxying HTTPS Requests
DNS Resolution
Connecting from Host
Testing Appsmith
Further Explorations
Conclusion

Docker Networks¶

We’ll be using Docker’s networking features. It provides a simple set of primitives to solve what we need here.

By default, Docker sets up a bridge network for us that allows connectivity to external endpoints. With explicit configuration, we can also have an internal network, where connections are only allowed to other containers that are also connected to this internal network.

The Docker’s official documentation about Networking in Docker Compose talks more in detail about this.

Sandbox¶

We need a sandbox environment where there’s a proxy and a subject application. We want to ensure that outgoing requests made from the subject application always fail unless they go via the proxy.

Let’s start with two containers, in a docker-compose.yml configuration.

The subject container, which is expected to make all outgoing requests via the proxy only.
The proxy container, which runs an HTTP proxy.

For the subject container, we’ll use an ordinary, friendly, memorable, vanilla Ubuntu container, with the command set to sleep infinity. This makes the container stay running so that we can get in and play around. Without this, the container would start, do nothing, and just exit. Not very useful.

For the proxy container, we’ll use mitmproxy. Rpecifically, the web interface version called mitmweb. This is an excellent proxy application, best used for intercepting requests during development. If you haven’t been spoilt by it, I encourage you to check it out.

So, this is our initial version of the sandbox:

version: "3"


services:
  subject:
    image: ubuntu
    command: sleep infinity

  proxy:
    image: mitmproxy/mitmproxy
    command: mitmweb --web-host 0.0.0.0
    ports:
      - "8081:8081"

Save this as a docker-compose.yml and do a docker-compose up -d. Once the two containers are running, open localhost:8081. This is where we’ll see all the HTTP requests flowing through our proxy.

Let’s get inside the subject container and make some requests. Start a shell with docker-compose exec subject bash. This will start a shell session running inside the subject container. Use the following command to install curl to play with:

apt update
apt install --yes curl
curl httpbun.com/get

This will make an external request, and print the response in the Terminal, but this request won’t show up in mitmproxy’s UI. For that, let’s do:

http_proxy=http://proxy:8080 curl httpbun.com/get

This will show the response in the Terminal, as well as in mitmproxy’s UI.

Let’s step this up. We’ll now block direct Internet access to the subject container and only allow connecting via the proxy. Consider the following docker-compose.yml file:

version: "3"


services:
  subject:
    image: ubuntu
    command: sleep infinity
    networks:
      intnet: {}

  proxy:
    image: mitmproxy/mitmproxy
    command: mitmweb --web-host 0.0.0.0
    ports:
      - "8081:8081"
    networks:
      intnet: {}
      extnet: {}


networks:
  intnet:
    internal: true
  extnet: {}

This is the same as the previous one, except for networks configurations. We define two networks, an internal network named intnet and an external network named extnet. The subject container is only connected to intnet, so it can only connect to other containers that are also connected to intnet. The proxy container is connected to both intnet and extnet, so it can both access other containers in the intnet as well as access the wider Internet.

With this setup, we expect direct network connections from subject to the Internet to fail, unless they go via the proxy container.

Let’s do a docker-compose up -d with this file, open a shell with docker-compose exec subject bash, and try to install curl again. But notice that when we run apt update, it doesn’t work, since this too requires the Internet and we’ve blocked it. We’ll use this as proof that blocking Internet is working!

Instead of apt update, issue the command http_proxy=http://proxy:8080 apt update. This should make all requests via that proxy, and should even show up in mitmproxy’s UI. Make sure you refresh the page, since the mitmproxy container has been recreated. Effectively, we do:

docker-compose exec subject bash
http_proxy=http://proxy:8080 apt update
http_proxy=http://proxy:8080 apt install --yes curl

Notice that these commands will show a bunch of requests in mitmproxy’s UI made to the Ubuntu package archives. Now, we can try out our test with curl:

curl httpbun.com/get

This will eventually timeout. The subject container doesn’t have access to the Internet, so this can’t run. Let’s try:

http_proxy=http://proxy:8080 curl httpbun.com/get

This should work, and the request should show up in mitmproxy’s UI.

Proxying HTTPS Requests¶

The setup we have so far works with proxying HTTP requests, but not for HTTPS requests. The whole point of HTTPS over HTTP is to make man-in-the-middle interventions impossible in a request. But that’s exactly what a proxy does!

To solve this, we’ll install and setup mitmproxy’s CA into the subject container. This will ensure that even if mitmproxy intervenes in HTTPS requests, our subject container will gladly accept and mark such requests as verified. This is documented on mitmproxy’s documentation.

The first time mitmproxy starts, it generates a new random CA certificate. This is the certificate is what we want our subject container to trust. So we’ll use a Docker volume, to share this cert with the subject container.

Show remaining 7 lines

version: "3"


services:
  subject:
    image: ubuntu
    command: sleep infinity
    networks:
      intnet: {}
    volumes:
      - ./certs:/certs:ro

  proxy:
    image: mitmproxy/mitmproxy
    ports: ["8081:8081"]
    command: mitmweb --web-host 0.0.0.0
    networks:
      intnet: {}
      extnet: {}
    volumes:
      - ./certs:/home/mitmproxy/.mitmproxy


networks:
  intnet:
    internal: true
  extnet: {}

Here, we define a volume on each container at the host path ./certs that’ll hold the contents of the /home/mitmproxy/.mitmproxy folder inside the proxy container. This is the path where mitmproxy will save the generated CA root certificate.

We also give the subject container access to this volume, at the /certs location inside the container. Notice the :ro suffix here, which means read-only access. We don’t expect the subject container to write anything to this volume, just read the CA certificate.

Let’s start the containers again with a docker-compose up -d and then run our tests again:

http_proxy=http://proxy:8080 apt update
http_proxy=http://proxy:8080 apt install --yes curl
http_proxy=http://proxy:8080 curl httpbun.com/get
https_proxy=http://proxy:8080 curl https://httpbun.com/get

But notice that the last command hitting the HTTPS API fails. This is because the subject container doesn’t trust the mitmproxy’s CA certificate. We’ll see something like this in the output:

curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

Now the issue is just that the SSL verification has failed. Since the verification failed, curl denies continuing with the request. We can tell curl to ignore the verification failure by using the --insecure flag like this:

https_proxy=http://proxy:8080 curl --insecure https://httpbun.com/get

But that’s not what we want. We want to tell curl to trust the mitmproxy’s CA certificate. Like this:

https_proxy=http://proxy:8080 curl --cacert /certs/mitmproxy-ca.pem https://httpbun.com/get

This should show up as an HTTPS request in mitmproxy with the ability to view full details of the request and response. Try out the same/similar curl commands without the proxy, and notice that those requests fail.

DNS Resolution¶

When an HTTP proxy is configured, DNS resolution is done by the proxy. This is because to make the request, it is the proxy that connects to the endpoint server. So it needs to know the IP address of the host. As long as the subject container is only making HTTP(s) requests, this is fine. But if we need it to make an explicit DNS query, we see that it fails:

http_proxy=http://proxy:8080 apt install --yes dnsutils
nslookup httpbun.com

This will fail because direct DNS resolution (as opposed to with a proxy, or with DNS-over-HTTPS) required access to the external network, which the subject container doesn’t have. We can solve this the same way we solved for HTTP requests, with a proxy.

Let’s add the following DNS proxy service to our docker-compose.yml:

  dns:
    image: mitmproxy/mitmproxy
    command: mitmdump --mode dns
    networks:
      intnet: {}
      extnet: {}

This is again an mitmproxy container, that runs in DNS mode. And it highlights how awesome mitmproxy is! This brings us a DNS proxy, that we can use to resolve DNS queries.

Now we’ll instruct the subject container to use this dns container, for DNS queries. This is handled by the resolv.conf inside the subject container. Let’s inspect its contents:

docker-compose exec subject cat /etc/resolv.conf

We should see something like this:

nameserver 127.0.0.11
options ndots:0

The IP Address next to nameserver is what will be used for DNS resolutions. We need this to be the IP Address of the dns container, as on the intnet network. The docker inspect command can help us find this IP Address. In the output of docker inspect $(docker-compose ps -q dns), under NetworkSettings.Networks, you’ll find the IP Address of the dns container, on the intnet network. We want this IP Address to be added to the resolv.conf of the subject container.

We can use the below commands to do this:

docker-compose up -d dns
docker-compose exec subject sh -c "echo nameserver $(
  docker inspect "$(docker-compose ps -q dns)" -f $'{{range $k, $v := .NetworkSettings.Networks}}{{$k}}:{{$v.IPAddress}}\n{{end}}' \
    | awk -F: '/_intnet:/ {print $2}'
) >> /etc/resolv.conf"

Note that we add another nameserver line with this IP Address instead of replacing the existing one. The reason for this is that the existing one is still useful to resolve internal hostnames, like proxy. Now let’s try the DNS query again:

nslookup httpbun.com

We should see the resolved IP Address show up. You can also try to resolve other hostnames, even internal ones like proxy, and see that it responds with that container’s internal IP Address.

Connecting from Host¶

So far, our subject container has only been sleeping (pun shamelessly intended). But usually, we’d want it to host a website or an app, that’s available on HTTP from outside the container, and outside the intnet network. Let’s set a small website in the subject container.

First, let’s create a nice index.html for our website:

cat <<EOF > index.html
<h1>My awesome website!</h1>
EOF

Second, let’s change the subject container to run a Python content webserver on port 80:

  subject:
    image: python:3-alpine
    command: python -m http.server -d /www 80
    ports:
      - "8090:80"
    networks:
      intnet: {}
    volumes:
      - ./certs:/certs:ro
      - .:/www

To verify that it’s working, let’s curl localhost in the subject container, and we should see the “My awesome website!” show up.

We’re also exposing this on port 8090 on the host, so if we open http://localhost:8090 in the browser on the host system, we should see this “My awesome webapp!” page, right?

But, no, it doesn’t work. The reason is that the subject container is only connected to the intnet network, which is inaccessible from outside the network-sandbox that Docker has created.

Remember how we used the proxy container to let subject access Internet resources? We’ll do the reverse here. We’ll define a reverse-proxy, that connects to both intnet and extnet, and will forward all incoming requests to subject. We can use mitmproxy here as well because it can act as a reverse proxy too (yes, mind blown).

  rproxy:
    image: mitmproxy/mitmproxy
    command: mitmdump --mode reverse:http://subject --listen-port 80
    ports:
      - "8091:80"
    networks:
      intnet: {}
      extnet: {}

Although, if you prefer to use a real reverse-proxy, like NGINX, this is the kind of configuration we’ll want:

worker_processes  1;
error_log /dev/stderr info;

events {
    worker_connections 1024;
}

stream {
    server {
        listen 80;
        proxy_pass http://subject;
    }
}

Point is to just listen on port 80 and forward all HTTP requests to the subject container’s webapp.

Let’s bring it up with docker-compose up -d rproxy.

Now, if we open http://localhost:8091 in the browser on the host system, we should see the response from our little piece of awesome.

Testing Appsmith¶

Appsmith is a low-code internal tool builder. It’s a webapp that lets you build internal tools, without writing code. It’s a great tool for building internal tools, but it’s also a great tool to test internal tools.

We wanted to test Appsmith and make sure it works well with a proxy. We also want to make sure that when a proxy is configured, it doesn’t make any requests trying to bypass it.

To do that, we started with the following docker-compose.yml file:

Show remaining 27 lines

version: "3"


services:
  appsmith:
    image: appsmith/appsmith-ce
    environment:
      HTTP_PROXY: http://proxy:8080
      HTTPS_PROXY: http://proxy:8080
    networks:
      intnet: {}
    volumes:
      - ./stacks:/appsmith-stacks
      - ./resolv.conf:/etc/resolv.conf:ro

  proxy:
    image: mitmproxy/mitmproxy
    command: mitmweb --web-host 0.0.0.0
    ports:
      - "8081:8081"
    networks:
      intnet: {}
      extnet: {}
    volumes:
      - ./certs:/home/mitmproxy/.mitmproxy

  rproxy:
    image: mitmproxy/mitmproxy
    command: mitmdump --mode reverse:http://subject --listen-port 80
    ports:
      - "8091:80"
    networks:
      intnet: {}
      extnet: {}

  dns:
    image: mitmproxy/mitmproxy
    command: mitmdump --mode dns
    networks:
      intnet: {}
      extnet: {}


networks:
  intnet:
    internal: true
  extnet: {}

A few things are happening here:

We start Appsmith in the internal network with proxy configured to use the proxy container. We don’t expose any ports for Appsmith, because we’ll be accessing it through the rproxy container.
We start the proxy container which will act as a proxy for all HTTP and HTTPS requests made by Appsmith. The proxy runs on port 8080, but the web UI runs on port 8081, which we expose to the host.
We start the rproxy container which will act as a reverse proxy for the host (i.e., us) to access Appsmith from the browser.
We start the dns container which will act as a DNS server for the internal network.
The Appsmith container uses two volumes: the stacks to hold all its data and the resolv.conf to add the dns container as another nameserver.
The proxy container has the certs volume, to store the CA certificate for mitmproxy.

Now, there’s still a few missing pieces:

We need the mitmproxy’s CA cert to be installed in the Appsmith container. This can be done, as detailed in the documentation, by copying the cert into stacks/ca-certs folder.
We need the dns container’s internal IP Address added to Appsmith container’s resolv.conf file.

docker-compose up -d dns
mkdir -pv stacks/ca-certs
cp -v certs/mitmproxy-ca.pem stacks/ca-certs/mitmproxy-ca.crt
cat <<EOF > resolv.conf
nameserver 127.0.0.11
options ndots:0
nameserver $(
  docker inspect "$(docker-compose ps -q dns)" -f $'{{range $k, $v := .NetworkSettings.Networks}}{{$k}}:{{$v.IPAddress}}\n{{end}}' \
    | awk -F: '/_intnet:/ {print $2}'
)
EOF
docker-compose up -d

This will pick up the new CA cert, install it to the trust store, and also start using the new entry in resolv.conf.

With this setup, if the Appsmith container makes any outgoing HTTP requests with the configured proxy, it should work fine and should show up in mitmproxy’s web UI. But if tries to make a request without the proxy, it should fail. This will highlight any features and functionality that get affected due to this.

Further Explorations¶

Configure static IP Addresses for the containers in the docker-compose.yml, especially the dns container. This should make it easier to configure the resolv.conf file.
Use NGINX stream reverse proxies to have the subject container connect to external databases.

Conclusion¶

Since requests directly to the Internet fail, we can use this setup to test if our application doesn’t leak any requests when a proxy is configured. Ideally, when I configure a proxy to be used by an application, I don’t expect it to make any request without that proxy. This sounds like an obvious thing to expect, but the best of expectations fail when it comes to software. This is why we test. This guide should help us test proxy support for applications better.

Shell Script Best Practices

2022-10-27T00:00:00+05:30

This article is about a few quick thumb rules I use when writing shell scripts that I’ve come to appreciate over the years. Very opinionated.

Things¶

Use bash. Using zsh or fish or any other, will make it hard for others to understand / collaborate. Among all shells, bash strikes a good balance between portability and DX.
Just make the first line be #!/usr/bin/env bash, even if you don’t give executable permission to the script file.
Use the .sh (or .bash) extension for your file. It may be fancy to not have an extension for your script, but unless your case explicitly depends on it, you’re probably just trying to do clever stuff. Clever stuff are hard to understand.
Use set -o errexit at the start of your script.
- So that when a command fails, bash exits instead of continuing with the rest of the script.
Prefer to use set -o nounset. You may have a good excuse to not do this, but, my opinion, it’s best to always set it.
- This will make the script fail, when accessing an unset variable. Saves from horrible unintended consequences, with typos in variable names.
- When you want to access a variable that may or may not have been set, use "${VARNAME-}" instead of "$VARNAME", and you’re good.
Use set -o pipefail. Again, you may have good reasons to not do this, but I’d recommend to always set it.
- This will ensure that a pipeline command is treated as failed, even if one command in the pipeline fails.
Use set -o xtrace, with a check on $TRACE env variable.
- For copy-paste: if [[ "${TRACE-0}" == "1" ]]; then set -o xtrace; fi.
- This helps in debugging your scripts, a lot. Like, really lot.
- People can now enable debug mode, by running your script as TRACE=1 ./script.sh instead of ./script.sh.
Use [[ ]] for conditions in if / while statements, instead of [ ] or test.
- [[ ]] is a bash ~~builtin~~ keyword, and is more powerful than [ ] or test.
Always quote variable accesses with double-quotes.
- One place where it’s okay not to is on the left-hand-side of an [[ ]] condition. But even there I’d recommend quoting.
- When you need the unquoted behaviour, using bash arrays will likely serve you much better.
Use local variables in functions.
Accept multiple ways that users can ask for help and respond in kind.
- Check if the first arg is -h or --help or help or just h or even -help, and in all these cases, print help text and exit.
- Please. For the sake of your future-self.
When printing error messages, please redirect to stderr.
- Use echo 'Something unexpected happened' >&2 for this.
Use long options, where possible (like --silent instead of -s). These serve to document your commands explicitly.
- Note though, that commands shipped on some systems like macOS don’t always have long options.
If appropriate, change to the script’s directory close to the start of the script.
- And it’s usually always appropriate.
- Use cd "$(dirname "$0")", which works in most cases.
Use shellcheck. Heed its warnings.

Template¶

#!/usr/bin/env bash

set -o errexit
set -o nounset
set -o pipefail
if [[ "${TRACE-0}" == "1" ]]; then
    set -o xtrace
fi

if [[ "${1-}" =~ ^-*h(elp)?$ ]]; then
    echo 'Usage: ./script.sh arg-one arg-two

This is an awesome bash script to make your life better.

'
    exit
fi

cd "$(dirname "$0")"

main() {
    echo do awesome stuff
}

main "$@"

Conclusion¶

I try to follow these rules in my scripts, and they’re known to have made at least my own life better. I’m still not consistent though, unfortunately, in following my own rules. So perhaps writing them down this way will help me improve there as well.

Do you have anything you think I should add to this? Please share in the comments!

Edit 1: Included fixes from HN comments at https://news.ycombinator.com/item?id=33355407 and https://news.ycombinator.com/item?id=33355077.

Edit 2: Fix from https://news.ycombinator.com/item?id=33354759.

Quick insecure TOTP

2022-09-10T00:00:00+05:30

This is about a Hammerspoon script I have that gives me a super-fast way to fill in TOTP fields in MFA logins.

NOTE: This method of doing MFA is very likely, very unsafe. If you are any bit unsure about anything here, please stay away from this document.

Hammerspoon¶

Hammerspoon is a very convenient and powerful system automation system, that can be programmed in Lua, for macOS. It’s been my replacement for AutoHotkey after moving away from Windows.

Install with:

brew install hammerspoon

TOTP Script¶

Four pieces to this.

One, open ~/.hammerspoon/init.lua, create if it doesn’t exist. Ensure you have the following line, perhaps among many others:

require("totp-generator").init()

Two, in ~/.hammerspoon/totp-generator.lua, put the following content:

Show remaining 59 lines

local os = require("os")
local gauth = require("gauth")

local mfa_note_path = os.getenv("HOME") .. "/.hammerspoon/otp-codes.csv"
local keys = nil

function init()
    hs.hotkey.bind({"alt"}, "n", launch)
    hs.pathwatcher.new(mfa_note_path, function()
        keys = loadItems()
    end):start()
    keys = loadItems()
end

local chooser = hs.chooser.new(function(item)
    if item == nil then
        return
    end

    local hash = gauth.GenCode(item._key, math.floor(os.time() / 30))
    hs.eventtap.keyStrokes(("%06d"):format(hash))
end)

chooser:queryChangedCallback(function(query)
    if query == "" then
        chooser:choices(nil)
    end

    local choices = {}

    for _, item in pairs(filter(query, keys) or {}) do
        table.insert(choices, item)
    end

    chooser:choices(choices)
end)

function launch()
    chooser:choices(nil)
    chooser:query("")
    chooser:show()
end

function filter(query, items)
    if query == "" then
        return nil
    end
    local lowerQuery = query:lower()
    local result = {}
    for _, item in pairs(items) do
        if item.text:lower():find(lowerQuery) ~= nil then
            table.insert(result, item)
        end
    end
    return result
end

function loadItems()
    local f = io.open(mfa_note_path, "r")
    local content = f:read("*all")
    f:close()

    local entries = {}
    -- Ref: https://www.lua.org/manual/5.3/manual.html#6.4.1
    for title, key, desc in string.gmatch(content, "%s*(.-)%s*,%s*(.-)%s*,%s*(.-)%s*\n") do
        print(title, desc)
        table.insert(entries, {
            text=title,
            subText=desc,
            _key=string.lower(string.gsub(key, "%s+", "")),
        })
    end

    return entries
end

return {
    init=init,
}

Three, download the gauth.lua file, and place it in ~/.hammerspoon folder. This is what does the bulk of the work, so thanks to teunvink for this!

Four, in the file ~/.hammerspoon/opt-codes.csv, add your TOTP code data, one per line, like this:

Mail,abcd efghi jklmn opqrst, Personal Mail Account
Another,onemoretotpcodehere, Another nice account

Each line contains three entries, separated by commas. First is a title, short and easily identifiable, second is the TOTP Key, third is a description that you could include any longer explanation for yourself.

The TOTP Key in the second column is given by the MFA provider when configuring MFA. We are usually asked to scan the QR code on our phones when setting this up, but we can also get a TOTP Key, usually hidden behind a button that reads something like Can't scan the code?. Copy that key and put an entry here.

Now start/reload Hammerspoon.

Now, while your cursor is in a TOTP field, hit Opt+n and start searching for any entry from the CSV file, and hit Enter on the entry you want to be filled in.

Demo¶

Your browser does not support HTML5 video. Here’s a link to the videoinstead.

Conclusion¶

Again, this can be very convenient, but is not very secure. The way I use it on my system is quite a bit different from what I demonstrate here, but that’s only because I don’t want to show off the exact format I am using. So feel free to tweak the CSV format and use something else like JSON or some other encrypted source altogether, like the pass CLI, perhaps. But, I can’t speak for that.

Keep your keys safe. They are nothing less than passwords.

Thank you for reading.

Peeking into HTTPS Traffic with a Proxy

2022-06-17T00:00:00+05:30

This article is about configuring a web application, Appsmith in this case, to run correctly behind a firewall that does SSL decryption, as a Docker container. Instead of a firewall, we’ll use a proxy, which, for the purpose of the problem statement, should be the same.

Table of Contents

Setting up mitmproxy
Setting up
Setting proxy on the whole container
Conclusion
Bonus: Using Charles

Since the proxy needs to support HTTPS decryption, we’ll use mitmproxy, but Charles or any other proxy that supports this would also work just fine.

Setting up `mitmproxy`¶

Install with:

brew install mitmproxy

Now launch it using:

mitmweb --listen 9020 --web 9021

Let it run in a separate Terminal window in the background. This will also open the proxy’s web UI at http://localhost:9021. To get a console UI instead, use mitmproxy instead of mitmweb in the above command.

Let’s try running some requests through this proxy to see it’s working well. Start with:

curl http://httpbun.com/get

This should print a valid JSON as the response, with some details about the request itself. Let’s repeat this with the proxy.

curl --proxy localhost:9020 http://httpbun.com/get

You should again see the same response here, but this time, a new entry should appear in the mitmweb UI. Here, you can inspect the request and be able to see the path, headers and response of the request.

So we’ve confirmed that our proxy works. Let’s add HTTPS to the mix.

curl https://httpbun.com/get

Again, same thing, but with HTTPS, without a proxy. You should see the same response as before, but without an entry in the proxy. That’s to be expected since we didn’t put a --proxy here. Let’s try that now.

curl --proxy localhost:9020 https://httpbun.com/get

This will fail with a verification error, that the SSL certificate couldn’t be verified. Let’s see why.

The way an SSL proxy works is by establishing two SSL connections, one with the client (a browser, or curl), initiated by the client, and another with the server (the httpbun.com server in this case). Everything sent by the client is encrypted using the certificate of mitmproxy, and everything by and to the server is encrypted with the server’s certificate.

The first time mitmproxy is started, it creates a new root certificate, in the ~/.mitmproxy folder. We can install this root certificate on our system, and then curl, or any other client, will trust it. The mitmproxy docs talk about how to install this cert. Optionally, for curl, instead of installing the cert, we can use the --cacert flag to point to the root certificate.

Another point to note here, is that installing this root certificate on your system, doesn’t mean it’ll be trusted in any Docker containers run on your system. Docker containers are isolated systems in this context, and maintain their own list of trusted root certificates.

To illustrate this, first, let’s run the same request from inside a container, and we should see the error right away:

docker run --rm alpine/curl --proxy host.docker.internal:9020 https://httpbun.com/get

At this, you should see a certificate validation error. This is because the root certificate of mitmproxy isn’t installed inside the container’s environment, and so the curl invocation inside, won’t be able to verify mitmproxy’s certificate.

To confirm that this is indeed because of mitmproxy, run the same docker run command without the --proxy host.docker.internal and you won’t see this error, despite running with https.

Now we’ve reproduced the situation where a process (a web server in our case), inside a Docker container, is trying to run behind an SSL-decrypting firewall (or, an SSL-decrypting proxy in our case here). Let’s see what we can do to get this to work.

Setting up¶

For our adventure here, we’ll use the Docker image of Appsmith, located at https://hub.docker.com/repository/docker/appsmith/appsmith-ce.

Let’s start a temporary Appsmith container with:

docker run --rm -d --name ace -p 80:9022 appsmith/appsmith-ce

Once this is ready, you should be able to access your Appsmith instance at http://localhost:9022.

Let’s try to run some curl requests inside this container, and get them to go through our mitmweb proxy.

docker exec ace curl --proxy host.docker.internal:9020 http://httpbun.com/get

This should work fine, and the request should show up in the proxy UI with full details as well. Now let’s do the same thing with https.

docker exec ace curl --proxy host.docker.internal:9020 https://httpbun.com/get

Let’s copy the root certificate into the container. For mitmproxy, the root cert is generated at first start, and is located at ~/.mitmproxy/mitmproxy-ca-cert.pem, going by the docs at https://docs.mitmproxy.org/stable/concepts-certificates/#the-mitmproxy-certificate-authority.

docker cp ~/.mitmproxy/mitmproxy-ca-cert.pem ace:/

With this command, we are copying the root certificate of mitmproxy into the container, into the root folder. Let’s run the same curl command now, providing it this root cert:

docker exec ace curl --proxy host.docker.internal:9020 --cacert /mitmproxy-ca-cert.pem https://httpbun.com/get

Now we’ll see the correct response, as well as full details of this request in the proxy UI.

Setting proxy on the whole container¶

We’re now at the point where it’s possible for requests inside the container to be run via the proxy, without any cert validation errors.

But this currently needs to be deliberate. Like in the example above, the curl command needs the cert to be specified explicitly. Instead, we’d like even ordinary curl commands to always go through the proxy, since, that’s how a firewall would work, and ultimately, that’s what we are trying to reproduce here.

Let’s stop the ace container and start it again with proxy configuration set.

docker stop ace
docker run --rm -d --name ace -p 80:9022 \
    -e HTTP_PROXY=http://host.docker.internal:9020 \
    -e HTTPS_PROXY=http://host.docker.internal:9020 \
    -e http_proxy=http://host.docker.internal:9020 \
    -e https_proxy=http://host.docker.internal:9020 \
    appsmith/appsmith-ce

Yep, that’s right. We need to set both http_proxy and HTTP_PROXY for all applications inside the container to take it seriously. 🤦

Let’s run a normal curl request on this container to see if the proxy settings are applied:

docker run ace curl http://httpbun.com/get

If the proxy configuration is working, then you should see this request appear in the proxy UI. Also, for https URLs:

docker run ace curl https://httpbun.com/get

This, as we can expect, fails due to a cert validation error, since it’s using the proxy, but the proxy’s certificate can’t be verified. We can provide the root cert of mitmproxy using the --cacert argument, but we want it to apply to all requests in the container, without such explicit configuration, so we won’t do that.

Instead, we want to install the root certificate of mitmproxy to the truststore, so that it’s available to all processes in the container for validating SSL certificates.

How this is done, depends on the operating system, but in our case, since the container is Ubuntu, all we need to do is:

Copy the certificate file to /usr/local/share/ca-certificates.
If the cert has the .pem extension, rename it to use the .crt extension. This is because Ubuntu’s update-ca-certificates command only picks files with a .crt extension.
Run update-ca-certificates.

Let’s copy the root cert into the container, and install it by running the above commands inside the container:

docker cp ~/.mitmproxy/mitmproxy-ca-cert.pem ace:/usr/local/share/ca-certificates/mitmproxy-ca-cert.crt
docker exec ace update-ca-certificates

The output should say that one certificate has been added to the truststore.

Let’s run the same https request again:

docker exec ace curl https://httpbun.com/get

This should now print the correct response, as well as show up on the proxy UI with full details for inspection. 🎉

Conclusion¶

This has culminated in creating the PR #14207. This PR contains a fer QoL improvements over the solution above.

We install ca-certificates-java, so that when we run update-ca-certificates, they are also installed into the JVM truststore. This is important since, one, Java maintains its own truststore (like Firefox), and two, Appsmith’s server runs on the JVM, so we need this there as well.
We provide support for a ca-certs folder in the volume, where users can drop any root cert files which will be auto-added on container startup.
We run update-ca-certificates --fresh instead of just update-ca-certificates, so that any cert file removed from the ca-certs folder, also gets removed from the truststores.
We mix up values of the proxy env variables, so that setting just one of http_proxy and HTTP_PROXY would be enough. This is also done for https_proxy and HTTPS_PROXY.
We provide a friendly warning when there’s .pem files in the ca-certs folder, since, most likely, they are there because the user forgot to rename them to use the .crt extension.
The JVM needs the -Djava.net.useSystemProxies=true to use the system configured proxy. Additionally, we also set the individual proxy configuration as additional system properties, so we can apply them when executing requests via Apaches’ web client libraries. Since, well, that library doesn’t respect system proxy configuration, although the rest of JVM does. Go figure.
We set a NO_PROXY env variable to hosts that should not go through the proxy, like localhost and 127.0.0.1.

Of course, considering our premise, which is to be able to use Appsmith behind an SSL decrypting proxy, all a user needs to do, is to place the firewall’s root certificate in the ca-certs folder, and restart the Appsmith container.

Bonus: Using Charles¶

Notes on using Charles instead of mitmproxy.

Install with:

brew install charles

Open Charles

Go to Proxy -> SSL Proxying Settings, under “SSL Proxying”, add a few domains you want SSL decryption to be done. Let’s add an entry under “Include”, with host set to httpbun.com and port set to 443 (which is the default port of HTTPS).

Check with http curl, response should show up correctly, and the request should show up in Charles with full information.

Check with https curl, get an error response back, and the request should show up in Charles with incomplete information, and a red error icon.

To get the Charles’ root certificate, go to Help -> SSL Proxying -> Save Charles Root Certificate.... Provide a location to save this cert, like your home folder.

The other steps should be the same as explained above for mitmproxy.

Time is different every time

2021-12-24T00:00:00+05:30

I love automating things, with shell aliases, global hotkeys, IDE snippets etc.

I see this question of have you spent more time automating something, than the time it’s saved you?

I’ve seen this question a lot of times over the years, whenever someone sees me using such a shortcut

How long did it take for you to build and learn that automation? Was the time you saved from it worth it?

My answer to that is, of course, yes. But the question is a little more nuanced.

Was the time saved worth it? Yes.

Was the time saved more than the time you spent in building and learning? No.

So, I spent more time, in building and learning the shortcut, than I saved because of the shortcut. This was illustrated well in this XKCD comic:

This, for most people, makes learning such shortcuts a waste of time. Because, of course, the net time difference is negative. Therein lies the folly.

Not all five minutes hold the same value.

There are times when I’m working on a critical fix that needs to go out in negative time. I hope to not end up in such situations, but we do. In such situations, saving a few precious seconds can mean a lot.

Consider a hypothetical example, an internal application server is down for whatever reason. I need to SSH into the server to see what’s up. Sure, I could go into my notes, search for the long SSH command for this server, SSH into it, then run commands to check logs, and then to restart if needed etc.

But, what if this was a single shell script. Just SSH into that server, print me the logs, and ask me if I want it restarted or not. Just a Y/N answer. I’m quite sure developing such a script would take more time than I’d be saving. However, I’d be spending that time developing this script, when I’m not in a hurry.

I can afford to spend those ten minutes in such a situation, to save ten seconds in a more critical situation. This is what makes it worth it.

But there’s an ugly face to this. We should know when some shortcut is enough. It’s easy to get into the trap of trying to optimize it and make it better and better. This is well represented in this comic by XKCD:

Part of the problem is, developers, just like artists, often are never done. There’s always a small finishing touch that can be done.

The trick is to recognize, and even assume, that you’ll be the only user ever of this shortcut. If it works for you, without too many brain cycles, in a critical situation, you’re done. Move on.

So, what do I automate? I’ve written about my automations and workflows quite a bit in the past:

Automating with Vim workplace, part 2, and part 3.
The Magic of AutoHotkey, and part 2.

Today, I primarily work with macOS, and have come to love Hammerspoon, as an alternative to AutoHotkey on Windows. I intend to write about my Hammerspoon automations as well, soon.

As I always say, identify, automate, repeat.

The Python `print` function

2020-04-05T00:00:00+05:30

The print function is most likely the first function we encounter when learning Python. That encounter usually looks like print("Hello world!"). After that, we go on to learning more stuff about it like being able to pass any number of arguments or of any type etc. I’m writing this article to give an idea how deep this rabbit hole goes. Turns out, the print function is very powerful. So let’s get a coffee, put on a dusty pair of sunglasses and bask in its power!

Table of Contents

The Basics
Handling of Multiple Arguments
Handling of non-string types
Write to files
The end= keyword argument
A Note about Python 2
A Sad Imitation
The pprint Function
Conclusion

The Basics¶

The basic premise of the print function is quite, well, basic. It prints out the given arguments to the standard output.

print("Hello world!")

This prints:

Hello world!

Calling it with multiple arguments:

print("hello", "world")

This prints:

hello world

Notice that the two strings, "hello" and "world" have a space character printed between them. The print function is helpful like that. By default, it places a space between every pair of consecutive arguments to be printed.

It doesn’t have to be strings either:

print(42, "is the answer")

This prints:

42 is the answer

Let’s look at each of these features in detail and see how they work.

Handling of Multiple Arguments¶

The print function accepts arbitrary number of arguments to be printed. These arguments can’t be keyword-arguments, because that doesn’t make much sense. That’s not to say the print function doesn’t accept any keyword arguments, it does. In fact, the space character that shows up between the arguments to be printed, can be changed by providing the sep= keyword argument.

Let’s look at the following examples:

>>> print("the", "world", "is", "a", "cruel", "place")
the world is a cruel place

>>> print("the", "world", "is", "a", "cruel", "place", sep="-")
the-world-is-a-cruel-place

>>> print("the", "world", "is", "a", "cruel", "place", sep="")
theworldisacruelplace

In the first example, we don’t explicitly give any value to the sep= keyword argument. So it takes it’s default value of the space character " ". In the second example, we set it to the dash character "-" and we can see in the output that the strings are printed joined by dashes.

In the third example, we set the sep= to an empty string so the output is all the words printed consecutively making it a cruel experience to read the text.

The sep= argument can be any string, it doesn’t have to be a single character and it can contain newlines and any other shenanigans too.

print("the", "birds", "in", "the", "sky", sep="\n  hammertime\n")

This prints the following mind bogglingly useful output:

the
  hammertime
birds
  hammertime
in
  hammertime
the
  hammertime
sky

Yeah, that’s a useful trick, but please, consider people’s sanity when you do such !@#$.

Handling of non-string types¶

We know that the print function can handle printing objects of any type, not just strings. But how does that work? The simple answer to this is that print will call str on non-string objects, and print the result of that call.

Let’s experiment with this. Consider the following class definition, which has just one method, the __str__. If you are unaware of this method, this is what’s called when str is applied on an instance of this class. I won’t go into details of that as that’s not the topic of this article.

class Tantrum:
    def __str__(self):
        return "awesome __str__ of object %r" % id(self)


print(Tantrum())

The output of running this would be something like (the number in the end would obviously be different if you run this script):

awesome __str__ of object 4508612624

So, what happens if our class doesn’t define an implementation for the __str__ method? Let’s try that out:

class LazySloth:
    pass


print(LazySloth())

This prints the following output (again, the number in the end would obviously be different for you):

<__main__.LazySloth object at 0x105f327d0>

Turns out that when there’s no implementation for __str__, calling str on the instance will still produce some information regarding the instance, which is what we got above.

A neat thing here is that this output is actually what calling repr on the instance would produce. So, it looks like str is falling back to returning the output of repr, when there’s no implementation for __str__ provided. Let’s confirm this by defining a __repr__ method:

class RatInFormals:
    def __repr__(self):
        return "a sad overridden __repr__ for instance %r" % id(self)


print(RatInFormals())

This prints the following output (again, the number will be different for you):

a sad overridden __repr__ for instance 4313389264

Now we get the output of the overridden __repr__.

So here’s how it works. The print function calls str on any non-string objects, which returns the result of the __str__ method, if available, or the result of calling repr on the instance, which in turn returns the result of the __repr__ method, which results in a generic output unless overridden (like in the last example above).

This should be case in favor towards spending a few seconds thinking about and writing useful __str__ methods for your custom types. Someone walking along working with your code later on, might just print an instance of your class to see what’s in it, and the generic output with the instance’s id is unlikely to be very helpful.

Write to files¶

Another keyword argument accepted by print is file=. This can be set to a file object, in which case the printing will be done to that file object instead of standard output.

Let’s try writing text to a file using the print function like this:

with open("outputs.txt", "w") as f:
    print("Stuff that doesn't show up in standard output", file=f)

Running this script obviously doesn’t print anything to the standard output. Instead, a file “outputs.txt” is created which contains the following text:

Stuff that doesn't show up in standard output

Note that since we are opening the file with mode as "w", so if a file named “outputs.txt” already exists in the current folder, it will be overwritten.

Using `sys.stderr`¶

The sys.stderr object in the sys module is a file-like object that represents the standard error. Writing to this file-like object directs it to the standard error stream. This is similar to the sys.stdout object which represents the standard output stream, in a similar fashion.

The file= keyword argument can be set to sys.stderr which will print to the standard error stream.

import sys

print("stuff going to standard error", file=sys.stderr)

You might not notice any difference from setting the file= argument in the above script, but if you are running a terminal emulator / shell that shows standard error in red color, then you’ll be able to see a difference.

Modifying `sys.stdout`¶

If we don’t set a value explicitly to the file= argument, the output will be sent to the standard output. There’s a small note to that point to be observed. In reality, the output will be sent to the sys.stdout file object. Usually, these two are the same. But, of course, we can set sys.stdout to something else.

Consider the following script which changes the value of sys.stdout, prints something, and then restores the value of sys.stdout to its original value.

import sys

original_stdout = sys.stdout
with open("out.txt", "w") as f:
    sys.stdout = f
    print("trololololol")

sys.stdout = original_stdout
print("restored")

If we run this script, we’ll only see restored in the output, but the file out.txt will be created with the output from line 6.

A minor point to note here is that it’s probably incorrect to say “the default value of the file argument is sys.stdout“. Since if that were the case, changing the value of sys.stdout should not affect the print function. Instead, I believe its default value is None and in that case, print uses the current value of sys.stdout.

We can verify this by explicitly passing in None to the file= argument:

import sys

original_stdout = sys.stdout
with open("out.txt", "w") as f:
    sys.stdout = f
    print("trololololol", file=None)


sys.stdout = original_stdout
print("restored", file=None)

The above script produces the exact same output as when we didn’t provide the file= argument explicitly.

Collecting with `io.StringIO`¶

The io.StringIO can be used to create a file object that collects all that is written to it, and then get it all out as a string. This is useful when calling a function that prints information using the print function, but instead, we want that output as a string for further processing. We can replace sys.stdout with a io.StringIO instance before calling that function, and then restore it after. Here’s how this might look like:

import io, sys

def print_product(a, b):
    print(a * b)


original_stdout = sys.stdout
string_io = io.StringIO()

sys.stdout = string_io
print_product(4, 5)
sys.stdout = original_stdout

result = string_io.getvalue()
print("Result is", result)

In this script, the print_product function prints the result of the multiplication, instead of returning it. So to get the result out of it, we replace sys.stdout with a io.StringIO instance and after calling the print_product function, we get the printed result using the .getvalue() method.

However, note that a similar operation with binary data using io.BytesIO is not possible, since the print function converts all its argument to text before writing to the file.

The `end=` keyword argument¶

This is like one of those things that we notice only when it’s taken away. The print function appends a newline at the end of the last argument to be printed. Check out the following example:

print("hello on day 1")
print("yeah right on day 2")
print("oh to hell with you on day 3")

The output of this script is the following:

hello on day 1
yeah right on day 2
oh to hell with you on day 3

The output from the three print calls shows up in three separate lines, nice and neat. But we never gave a "\n" in our calls to print. It comes from the default value of the end= argument of the print function. If we set the end= argument to something else, it will replace the newline in the end of the output from a print call.

Check out the following script for example:

print("Doing awesome stuff... ", end="")
# do awesome stuff here
print("done")

This script prints the following output:

Doing awesome stuff... done

The output of the two print calls shows up on the same line, since we suppressed the newline that would’ve been printed from the first call to print, by setting the end= argument to an empty string. The second call to print will continue this sentence and finish the line by adding a newline at the end.

A Note about Python 2¶

Python 2 had a print statement, which worked similar to the print function in Python 3, but is not as feature-rich. Additionally, being a statement, it couldn’t be used in all the places, for example, within in a lambda expression.

However, Python 2.6 introduced a future import that brought the print function to Python 2. Adding a from __future__ import print_function line at the start of a Python 2 file would disable the print statement in that file and turn print into a function. This can be very useful for when migrating to Python 3.

A Sad Imitation¶

Here’s a sad little imitation of the print function that should behave similar to the builtin in most of the features that have been discussed in this article:

import sys

def sad_print(*args, sep=" ", end="\n", file=None):
    (sys.stdout if file is None else file).write(sep.join(map(str, args)) + end)


sad_print("the answer is", 42)

In this sad_print function, what we are essentially doing is:

Pick sys.stdout if file is None.
Call str on all of the provided arguments.
Join the results of the calls to str using the value of sep.
Concatenate the value of end to the result of above step.
Call write on result of point-1, with the result of the above step.

I’m sure the print builtin does quite a bit more than just this one-liner, but doing this can give us some perspective of how all the different pieces fit in together.

The `pprint` Function¶

Python’s standard library has a pprint module, with a pprint function that takes one argument, and prints it prettily.

For example, consider the following script:

from pprint import pprint

numbers = [1, 2, 3, 4, 5, 6]

print(numbers)
pprint(numbers)

planets = ["Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune", "Pluto"]

print(planets)
pprint(planets)

We are calling print and pprint on the same list of strings. Let’s look at the output:

[1, 2, 3, 4, 5, 6]
[1, 2, 3, 4, 5, 6]
['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune', 'Pluto']
['Mercury',
 'Venus',
 'Earth',
 'Mars',
 'Jupiter',
 'Saturn',
 'Uranus',
 'Neptune',
 'Pluto']

As we can see, the output from pprint is prettified, but only if necessary. In the first case, where we were printing just six numbers, the output was fine as a single line so pprint did not cut it up into several lines. But in the second case, the line ends up too long and it may not be comfortable on small terminal screens. So, it cuts it up.

The pprint module can be useful to prettily print (or formatting) lists and dictionaries. Check out its official documentation for more information.

Conclusion¶

We may not use all these features of the print functions all the time, but I think it’s useful to know that print is not just a function that prints the given string. It’s quite a bit more than that; and when we need it, it’s there without having to import anything. Thank you for reading!

Dependency Injection In Python

2020-03-29T00:00:00+05:30

I chose a form of dependency injection (DI) as the solution to a recent problem I needed to solve. This is a quick write-up of how I did it in Python, using the standard library modules.

Table of Contents

The Problem
The Legacy Solution
The New DI Solution
Conclusion

The Problem¶

I’ll be illustrating the problem in a slightly different context, so I don’t derail too much into the subject, which would just be a distraction. So, if the problem feels unrealistic or stupid, that’s just a result of my unimaginative thinking.

We have an application, let’s call it the task runner, that lets users choose a task to run, and runs that task. Each such task is implemented as a separate Python script file, that take no user inputs, but do connect to a database and a few REST endpoints.

So this is how it works. We have a bunch of wrapper classes that provide high-level abstractions for the database and the REST endpoints. Instances of these classes are given to the task scripts, which use them to perform their task.

The task scripts are also expected to return some information back to our application, with details such as whether the task was successful or the reasons if there’s an error etc. The approach for how this is done is detailed in the following Solution section(s).

The Legacy Solution¶

The current way this is working (which I took the liberty to call The Legacy Solution) is that when a user requests for a specific task to be run, the application reads up the relevant Python script file and calls eval on the contents. A pre-made dictionary holding all the instances of high-level abstractions is provided in the global scope of this call to eval.

This has been working well for several years now and, although it feels dirty in hindsight, there were probably good reasons it was done this way:

It was very simple and easy to implement. There’s little to no magic.
The task scripts can be updated on production without restarting the application and the changes would take effect immediately.
The scripts’ logic can be written as module level code. Full freedom on how the code is structured and written.

Arguing on how horrible this approach is would be a great topic for a heated debate, and, fortunately that’s not what I set to write about here. This simple method, while worked, didn’t scale with the team. We soon decided to move to a more sophisticated approach and so started looking.

A major reason (among several) for this decision was to have the scripts not depend on implicit globals. The use of implicit globals meant that the scripts were using variables that appear as not defined to static code analyzers. Additionally, since the script file was being read into a string and eval-ed, the stack trace from any errors were not very helpful.

The New DI Solution¶

In the new proposed way for this to work, we have made three critical changes:

The Python script files will be import-ed as Python modules, and the task_main function at the module level will be called to run the task.
Nothing is implicitly injected into the script’s global scope.
Access to the API abstractions is done through a form of dependency injection.

In the task scripts, we have a function defined like the following:

def task_main(users_service, sales_service):
    # do something with `users_service` and `sales_service`.

Here, the task_main function is defined to accept two arguments. The users_service and sales_service. In our task runner application, we use the inspect module to identify the abstractions being used in task_main and pass them accordingly. Here’s how it works:

import importlib
import inspect

def run_task_script(script):
    module_name = script.replace('.py', '')
    module = importlib.import_module(module_name)
    args = inspect.signature(module.task_main).parameters

    kwargs = {}

    for name in args:
        kwargs[name] = get_service_instance(name)

    response = module.task_main(**kwargs)

    record_task_response(response)

In this function, we first convert the script file name into it’s module name (hoping it doesn’t contain any spaces or dash characters). Then, we use the importlib module to import the module of that name. Next, we call inspect.signature function on the module’s task_main function to get its parameter names.

Based on these argument names (in args), we then construct a dictionary with these names as keys and the instance of the API abstraction class, as the value. We then pass this as the keyword arguments to the call to module.task_main.

In this way, the scripts don’t assume any implicit globals and the task_main accepts arguments that it needs and no more. This makes the code much cleaner and easier to do static analysis on. Besides, since we import the module and call a function in it, we get nicer stack traces when there’s an exception.

Conclusion¶

I’m sure there’s better, and more involved implementations of doing DI in Python, but what we’ve done above is enough for the target problem. Additionally, it’s just using the standard library, so, extra brownie points for that!

The Magic of AutoHotkey — Part 2

2020-03-22T00:00:00+05:30

In the previous part of The Magic of AutoHotkey, we looked at automating small pieces of routine tasks with various applications, as well as identifying things that could be done better with a quick hotkey. This is the next chapter of the story. In this article, I’ll show you how I tamed the stock file explorer as well as connecting to office applications with OLE to provide additional rich functionality.

This article is part of a series:

Part 1
Part 2 (this article).

Table of Contents

File Explorer Magic
Email Selected File(s) with Outlook
- Global Hotkey for New Mail
Conclusion

File Explorer Magic¶

The file explorer is probably my most used application during work. Yet, it doesn’t feel like it’s tuned for a power user. Maybe that’s also why there’s so many alternatives to file explorers. I’ve tried a few of them in the past, but the best has been to add exactly the few things I needed in the native file explorer, using AutoHotkey. I’ll run through those here.

As is the case in the previous part, I have a module called file-explorer-tweaks.ahk which is #Include-ed in my master script.

To start, we define a window group, which includes all file explorer windows. We later use this group to define hotkeys that we want to work only on the file explorer windows.

GroupAdd, FileListers, ahk_class CabinetWClass
GroupAdd, FileListers, ahk_class WorkerW
GroupAdd, FileListers, ahk_class #32770, ShellView

This group now matches the file explorer windows, desktop and the file open dialog windows.

Focus Location Editor¶

Almost all the web browsers today have the default hotkey ^l which focuses the location bar, and selects everything in it. But in the file explorer, this is !d. Habits rule and I constantly hit ^l in the file explorer window when I wanted to change something in the location bar. Obviously, it didn’t work, and it would drive me crazy. Until I added the following to save me from insanity:

#IfWinActive ahk_group FileListers
^l::SendInput !d

While this works fine on the face of it, if I hit Escape after focusing the location bar like this, the focus is not returned to the file list. I haven’t figured out a solution to that yet, so that one’s open.

Open Command Window¶

The file explorer has a nice less-known trick. If I right-click without any files selected and with the Shift key held down, I get an extra option in the context menu, called “Open command window here”. Clicking on that menu item will open a new command prompt window in the current directory. This is extremely convenient if you need the command window often (which you might, especially if you’re a software developer).

But this needed the mouse. I wanted to do this with the keyboard. Turns out it’s easier than one might think:

#IfWinActive ahk_group FileListers
^!t::SendInput !dcmd{Enter}

Here, we define the ^!t hotkey which will focus the location bar and type in cmd and hit the Enter key. This will actually open up a command window in the current directory.

Folder Shortcuts¶

Folder shortcuts is where I define a hotkey that will navigate to a specific directory, always. For example, while in a file explorer, hitting ^h should navigate to the home folder, hitting ^j should navigate to the Downloads folder (this key opens the downloads view in web browsers, see what I did there?).

#IfWinActive ahk_group FileListers
^h::Send !d%homedir%{Enter}
^j::Send !d%homedir%\Downloads{Enter}
^y::Send !dLibraries\Documents{enter}
^k::Send !dC:\work{Enter}
^t::Send !dC:\tools{Enter}
^b::Send !dC:\labs{Enter}

This snippet uses the homedir variable defined in the previous article.

On the face of it, these are very simple hotkeys. We pass !d to focus the location input and type in the location where we want to go to. Simple & effective. They serve sort of like quick access bookmarks and are probably my most used hotkeys defined with AutoHotkey overall, by a margin.

In the previous section, we dealt with navigating to absolution locations. But how about directional navigation, where we want to go back or forward or even up the directory chain?

The default hotkeys for this leverage the arrow keys, which require taking my hands off the keyboard’s home row. So, I’m using the following keys for these three operations, which are inspired by similar behavior in Vim (again!).

; Navigate with the keyboard better!
#IfWinActive ahk_group FileListers
^o::SendInput, !{Left}
^i::SendInput, !{Right}
^u::SendInput, !{Up}

To top it, I have also defined mouse “hotkeys” for these three actions. I rarely use these nowadays, but they’re still there for when I already have a hand on the mouse.

; Navigate with the mouse!
#IfWinActive ahk_group FileListers
!WheelUp::SendInput, !{Up}
^WheelUp::SendInput, !{Left}
^WheelDown::SendInput, !{Right}

Pretty self-explanatory really.

Select Files by Pattern¶

I particularly love this one. When I trigger this hotkey, a little prompt shows up where I enter a regular expression and then every file in the current folder that matches this pattern will be selected. The first time I used this on a folder with ~300 files, I practically had tears in my eyes at how easy it was to make the file selection by a pattern.

So, here’s the code for this:

Show remaining 16 lines

; Get selected files in explorer and more:
; http://www.autohotkey.com/board/topic/60985-get-paths-of-selected-items-in-an-explorer-window/
#IfWinActive ahk_group FileListers
^s::
SelectByRegEx() {
    static selectionPattern := ""
    WinGetPos, wx, wy
    ControlGetPos, cx, cy, cw, , DirectUIHWND3
    x := wx + cx + cw/2 - 200
    y := wy + cy
    InputBox, selectionPattern, Select by regex
        , Enter regex pattern to select files that CONTAIN it (Empty to select all)
        , , 400, 150, %x%, %y%, , , %selectionPattern%
    if ErrorLevel
        Return
    for window in ComObjCreate("Shell.Application").Windows
        if WinActive("ahk_id " . window.hwnd) {
            pattern := "S)" . selectionPattern
            items := window.document.Folder.Items
            total := items.Count()
            i := 0
            showProgress := total > 160
            if (showProgress)
                Progress, b w200, , Matching...
            for item in items {
                match := RegExMatch(item.Name, pattern) ? 17 : 0
                window.document.SelectItem(item, match)
                if (showProgress) {
                    i := i + 100
                    Progress, % i / total
                }
            }
            Break
        }
    Progress, Off
}

The code is not very pretty, but oh well. It works well, and I’d rather not touch it.

Here’s a little mute video recording of this at work:

Your browser does not support HTML5 video. Here’s a link to the videoinstead.

Batch Rename¶

This is actually built to be invoked as a separate AutoHotkey process, not to be #Include-ed into a master script. That’s because the GUI is slightly more complex than what we’ve seen in previous sections and I didn’t bother to make it work well as a module.

Show remaining 81 lines

batch-rename.ahk

#NoEnv
#NoTrayIcon

active_hwnd := WinActive("ahk_class CabinetWClass")
If (active_hwnd) {
    for window in ComObjCreate("Shell.Application").Windows
        If (active_hwnd == window.hwnd) {
            parent := uriDecode(StrReplace(window.LocationURL, "file:///", "", , 1))
            ShowGui()
        }
}

ShowGui() {
    global active_hwnd, parent, SourcePattern, TargetPattern, WindowListView
    Gui, Font, s10 q5, Segoe UI
    Gui, Margin, 6, 6
    Gui, +Owner%active_hwnd%
    Gui, Add, Text, , Search pattern:
    Gui, Add, Edit, r1 w300 vSourcePattern gInputChanged -WantReturn X+6 Section
    Gui, Add, Text, X+6, Full regex is supported
    Gui, Add, Text, XM, Replacement:
    Gui, Add, Edit, r1 w300 vTargetPattern gInputChanged -WantReturn XS YP
    Gui, Add, Text, X+6, Use $1, $2, ${10}, ${named}, $U1, $U{10}, $L2, $T0 etc.
    Gui, Add, Button, Default gDoRename XM w80, Apply
    Gui, Add, Button, gShowHelp X+6 w80, Help
    Gui, Add, ListView, Grid r12 w800 vWindowListView XM, Replacements|Current name|Renamed to

    imList := IL_Create(2)
    LV_SetImageList(imList)
    IL_Add(imList, "check.png", 0xFFFFFF, 1)
    IL_Add(imList, "error.png", 0xFFFFFF, 1)
    ; IL_Add(imList, "shell32.dll", 145)
    ; IL_Add(imList, "shell32.dll", 234)

    Gui, Show, , Rename with Regex: %parent%
}

InputChanged() {
    global parent, SourcePattern, TargetPattern
    GuiControlGet, SourcePattern
    GuiControlGet, TargetPattern
    LV_Delete()
    Loop, Files, %parent%\*, FD
    {
        toName := RegExReplace(A_LoopFileName, SourcePattern, TargetPattern, count)
        icon := 1
        If (A_LoopFileName == toName)
            icon := 3
        Else if (FileExist(parent . "/" . toName))
            icon := 2
        LV_Add("Icon" . icon, count, A_LoopFileName, toName)
    }
    LV_ModifyCol()
}

DoRename() {
    global parent, SourcePattern, TargetPattern
    Gui, Submit

    If (SourcePattern != "")
        Loop %parent%\* {
            toName := RegExReplace(A_LoopFileName, SourcePattern, TargetPattern)
            FileMove, %parent%\%A_LoopFileName%, %parent%\%toName%
        }

    GuiClose()
}

GuiEscape() {
    GuiClose()
}

GuiClose() {
    ExitApp
}

uriDecode(str) {
    Loop
        If RegExMatch(str, "i)(?<=%)[\da-f]{1,2}", hex)
            StringReplace, str, str, `%%hex%, % Chr("0x" . hex), All
        Else Break
    Return, str
}

ShowHelp() {
    help=
    (
## Pattern:

The pattern to search for, which is a Perl-compatible regular expression (PCRE). The pattern's options (if any) must be included at the beginning of the string followed by a close-parenthesis. For example, the pattern "i)abc.*123" would turn on the case-insensitive option and search for "abc", followed by zero or more occurrences of any character, followed by "123". If there are no options, the ")" is optional; for example, ")abc" is equivalent to "abc".

## Replacement:

The string to be substituted for each match, which is plain text (not a regular expression). It may include backreferences like $1, which brings in the substring from Haystack that matched the first subpattern. The simplest backreferences are $0 through $9, where $0 is the substring that matched the entire pattern, $1 is the substring that matched the first subpattern, $2 is the second, and so on. For backreferences above 9 (and optionally those below 9), enclose the number in braces; e.g. ${10}, ${11}, and so on. For named subpatterns, enclose the name in braces; e.g. ${SubpatternName}. To specify a literal $, use $$ (this is the only character that needs such special treatment; backslashes are never needed to escape anything).

To convert the case of a subpattern, follow the $ with one of the following characters: U or u (uppercase), L or l (lowercase), T or t (title case, in which the first letter of each word is capitalized but all others are made lowercase). For example, both $U1 and $U{1} transcribe an uppercase version of the first subpattern.

Nonexistent backreferences and those that did not match anything in Haystack -- such as one of the subpatterns in "(abc)|(xyz)" -- are transcribed as empty strings.
)
    MsgBox, %help%
}

Put this script at a convenient location, probably right next to your master script, and add the following hotkey to your master script:

#IfWinActive ahk_group FileListers
^+b::Run batch-rename.ahk

Here’s a little mute video recording of some usage examples of this tool:

Your browser does not support HTML5 video. Here’s a link to the videoinstead.

If you’re using this, please keep caution. Please inspect the previous table before clicking on the “Apply” button. If it ends up messing your files up, don’t hold me responsible. I’m sharing this without warranty. As any source code block on this website, this is shared here with MIT License.

Copy Paths of Selected Files¶

This, again, is actually partly fulfilled by default Windows functionality. When we Shift+Right Click on a file, we get the option to “Copy as path”, which works fine for simple cases. But I wanted the following additional things for this feature:

A keyboard hotkey, like ^+c.
No surrounding double quotes.
Work with multiple files being selected. Copy each file’s path as one line.

For this, I defined the following ^+c hotkey on the file explorer windows.

#IfWinActive ahk_group FileListers
^+c::
    Clipboard := JoinArrayContents(Explorer_GetSelected())
    Return

This will get a list of all selected files in the current explorer window and join them into a single string. The Explorer_GetSelected function comes from this AutoHotkey forum post and the JoinArrayContents is given below:

JoinArrayContents(arr, delimiter="`n") {
    content := ""
    for index, item in arr {
        if index > 1
            content := content . delimiter
        content := content . item
    }
    return content
}

Now I can select one or more files, hit ^+c and the full paths of all the selected files will end up in my clipboard.

Copy Contents of Selected Files¶

This one, although sounds similar to the previous section, is quite different and useful in a very different way. Where the previous section’s hotkey copies the selected files’ paths, this hotkey is intended to copy the selected files’ contents as a whole.

I have a few (several?) small text files with snippets, template messages, etc. With this, I just select one or multiple files and hit Ctrl+Shift+x and I’m ready to paste their contents.

#IfWinActive ahk_group FileListers
^+x::
    CopySelectedFileContents() {
        files := Explorer_GetSelected()
        content := ""
        for i, file in files {
            FileRead, text, %file%
            if i > 1
                content := content . "`n`n"
            content := content . text
        }
        Clipboard := content
    }

This is the same Explorer_GetSelected I referred to in the previous section. However, in the above hotkey definition, instead of setting the paths to Clipboard, we set the contents of the files.

Just like the previous hotkey, I can select multiple text files and hit ^+x and the contents of all selected files will end up in my clipboard, separated by two blank lines.

This doesn’t work with images yet though. Still have to figure that one out.

Create File with Clipboard Contents¶

This is the opposite of the previous hotkey. Here, I want whatever is in the Clipboard to be saved to a text file in the current folder.

#IfWinActive ahk_group FileListers
^+v::
    CreateFileWithClipboardContents() {
        loc := Explorer_GetPath()
        WinGetPos, wx, wy
        ControlGetPos, cx, cy, cw, , DirectUIHWND3
        x := wx + cx + cw/2 - 200
        y := wy + cy
        InputBox, filename, Clipboard File
            , Enter file name to paste clipboard contents in:, , 400, 120, %x%, %y%, ,
            , clip.txt
        if ErrorLevel
            Return
        filepath := loc . "\" . filename
        if (FileExist(filepath)) {
            MsgBox, 1, Overwrite, Overwriting existing '%filename%'!
            IfMsgBox Cancel
                Return
            FileDelete, %filepath%
        }
        Fileappend, %Clipboard%, %filepath%
    }

The Explorer_GetPath function used in the above snippet is also from the same source I mentioned in the previous sections. The way this works is when the hotkey is triggered, we are prompted to enter the name of the file to which the clipboard’s contents will be saved. Once we provide a file name and submit, the file is created.

With this, I can copy some text out of a webpage or an email in Outlook and saving it to a text file is a quick ^+v. Once I created this hotkey, it became my primary way of creating new text files. I no longer open Notepad, write (or paste) and then save the file to the desired directory. Instead, I open the folder, use this hotkey to create the file, and then open the file in Notepad. Somehow, it feels more natural.

This doesn’t work with images either. Have to figure this one out too.

Create Folder Hierarchy and Enter it¶

The file explorer has a default hotkey for creating new folders (Ctrl+Shift+n), but it doesn’t let us create a tree or folders at one go. To do that, we have to create a directory, enter it, create again etc. This quickly gets tedious if it has to be done often.

As always I tried to address it with AutoHotkey.

Show remaining 10 lines

#IfWinActive ahk_group FileListers
^n::
CreateFolderHierarchy() {
    loc := Explorer_GetPath()
    WinGetPos, wx, wy
    ControlGetPos, cx, cy, cw, , DirectUIHWND3
    x := wx + cx + cw/2 - 200
    y := wy + cy
    InputBox, folder, Create Folder, Enter folder name/path create:, , 400, 120
        , %x%, %y%
    if ErrorLevel
        Return
    folder := StrReplace(folder, "/", "\")
    pos := RegExMatch(folder, "O)\{([^\{]+)\}", match)
    folders := []
    if (pos > 0) {
        parts := StrSplit(match.value(1), ",")
        prefix := SubStr(folder, 1, match.Pos(0) - 1)
        suffix := SubStr(folder, match.Pos(0) + match.Len(0))
        for i, part in parts {
            folders.Push(prefix . part . suffix)
        }
    } else {
        folders.Push(folder)
    }
    for i, folder in folders {
        FileCreateDir, %loc%\%folder%
    }
    Explorer_GetWindow().Navigate2(loc . "\" . folders[folders.Length()])
}

This uses the same explorer library I mentioned in the previous sections. When this hotkey is triggered, we get a prompt where we can enter a folder tree (i.e., folders separated by / or \) and they will all be created. As a bonus, we are also switched to that newly created folder, so we can start working with it right away.

Now I can hit ^n and type in src/main/java or 2020-01/pics, and all nesting structure is created and navigated, which is usually followed by pasting some files.

Email Selected File(s) with Outlook¶

Outlook is necessary tool for email at most corporate workplaces. So it’s important to look at how we use it, and what parts of it we can automate / improve.

It’s also quite common to have to send files over email as attachments. Yet, considering how often we tend to do that, it’s still a tedious process. Go to outlook, start new mail, drag-drop the file in this window, fill up the mail, send. It gets a bit better if you copy the file to clipboard and then instead of starting a new mail with Ctrl+n, you could just hit Ctrl+v in the Outlook Mails view and new mail will open up with file in clipboard as attachment. But I’d say it’s still not good enough.

The solution I currently use is the Ctrl+m hotkey for file explorers. The workflow is that I select some files in my file explorer, hit Ctrl+m and a new mail window opens up with the selected files as attachments, the message body containing the list of files for me to edit and subject containing the list of files.

#IfWinActive ahk_group FileListers
^m::OutlookNewMail(Explorer_GetSelected())

The Explorer_GetSelected function is from the same library I mentioned in an earlier section. The following is the definition of the OutlookNewMail function:

Show remaining 9 lines

OutlookNewMail(attachments=0) {
    outlook := ComObjActive("Outlook.Application")
    mail := outlook.CreateItem(0)

    if (attachments != 0) {
        msg := ""
        sub := "Files: "
        for index, file in attachments {
            mail.Attachments.Add(file)
            SplitPath, file, basename
            msg := msg . "<p class=MsoNormal>&nbsp;&nbsp;&nbsp; "
                    . basename . "<o:p></o:p></p>"
            if (attachments.Length() == 1)
                sub := "File: " . basename
            else if (index == attachments._MaxIndex())
                sub := sub . " & " . basename
            else if (index == attachments._MinIndex())
                sub := sub . basename
            else
                sub := sub . ", " . basename
        }

        FileRead, emailTpl, email.tpl.txt
        mail.HTMLBody := StrReplace(emailTpl, "$$MESSAGE$$", msg . "</ul>")
        mail.Subject := sub
    }

    mail.Display
}

AutoHotkey supports connecting to OLE objects, which means we can create hotkey that create rich interactions with Office applications like Outlook. We leverage this in the above function.

All I have to do now, is fill up the “To:” field and hit Ctrl+Enter. I’ve been loving this ever since.

Note, of course, that since this connects to the Outlook OLE object, Outlook needs to be running for this work.

Global Hotkey for New Mail¶

If you’ve noticed, the above function’s attachments argument has a default value. If this argument is not provided, we just get a blank email window open up. This is convenient on its own. So I have it as a global hotkey:

#c::OutlookNewMail()

This works really well since the new mail window opens up with my signature already filled up and the focus is set to the “To:” field perfectly to quickly start working on my email.

Conclusion¶

AutoHotkey is a powerful tool for automating all sorts of workflows on Windows. If you can get past the quirks in the language itself, the underlying engine is very powerful. I know that over the few years I’ve used it, I’ve only made use of a small portion of its potential. In addition, the help file that is shipped with AutoHotkey (right-click on the tray icon and click on “Help”) is very good. It’s exhaustive, very detailed and contains lots of examples. I encourage going over it occasionally to find interesting things to add to your workflow. Good luck!

Automating the Vim workplace — Chapter Ⅲ

2020-03-15T00:00:00+05:30

This is the third installment of my Automate the Vim workplace article series. As always, feel free to grab the ideas in this article or, better yet, take inspiration and inspect your workflow to identify such opportunities.

This article is part of a series:

Chapter Ⅰ.
Chapter Ⅱ.
Chapter Ⅲ (this article).

Table of Contents

Copy file full path
Squeeze / Expand contiguous blank lines
Duplicate Text in Motion
Transpose
Using vartabstop to Line Up
Strip Trailing Spaces
Append character over motion
Conclusion

Please note that all that I share below is what I’m using with Vim (more specifically, GVim on Windows). I don’t use Neovim (yet) and I can’t speak for any of the below for Neovim.

Copy file full path¶

I work with CSV files quite a bit. I spend a lot of time grooming them, fixing them etc. in Vim and then once they’re ready, I need to upload it to an internal tool. For that, the following command has proven to be super useful.

" Command to copy the current file's full absolute path.
command CopyFilePath let @+ = expand(has('win32') ? '%:p:gs?/?\\?' : '%:p')

This is one of those commands that feel super-simple and super-obvious once we add it to our workflow. Running this command places the full path of the current buffer’s file into the system clipboard. Then, I just go to my browser, click on the upload button and paste the file location. This is much quicker than having to navigate to the folder and selecting the file. It also helps avoid selecting the wrong file (which happened more than once to me).

Squeeze / Expand contiguous blank lines¶

When building or editing large CSV files, I often end up with several (read: hundreds) of blank lines. This is usually because I select those lines in visual block mode, cut them, and then paste as a new column to some existing rows. Solving that problem is for another day I suppose.

Nonetheless, I needed a quick way to condense several blank lines into a single blank line. The following is the result of that:

nnoremap <silent> dc :<C-u>call <SID>CleanupBlanks()<CR>
fun s:CleanupBlanks() abort
    if !empty(getline('.'))
        return
    endif
    let l:curr = line('.')

    let l:start = l:curr
    while l:start > 1 && empty(getline(l:start - 1))
        let l:start -= 1
    endwhile

    let l:end = l:curr
    let l:last_line_num = line('$')
    while l:end < l:last_line_num && empty(getline(l:end + 1))
        let l:end += 1
    endwhile

    if l:end >= l:start + v:count1
        exe l:start . '+' . v:count1 . ',' . l:end . 'd_'
    else
        call append(l:end, repeat([''], v:count1 - (l:end - l:start) - 1))
    endif
    call cursor(l:start, 1)
endfun

This defines the dc mapping, which will condense multiple blank lines under the cursor into a single one.

Then, on a weekend when I was feeling particularly silly, I extended this to accept a number in front of dc which specifies the number of newlines to end up with. So now, this mapping can both condense, and expand vertical blank space to any size I want! Yay, silly weekends!

Duplicate Text in Motion¶

Copy-pasta is a legitimate writing and coding technique. But I do it so mindlessly and often, I started to think of duplicating as a distinct operation, and not as a combination of yanking and then pasting. But if that is so, duplicating some text should not mess with my registers. This was messing with the nice semantic pool my thoughts were swimming in (!).

So I built a mapping that would let me duplicate the text over any motion (like text objects), without touching the registers. Following is how it’s built:

Show remaining 6 lines

" Duplicate text, selected or over motion.
nnoremap <silent> <Leader>uu :t.\|silent! call repeat#set('duu', v:count)<CR>
nnoremap <silent> <Leader>u :set opfunc=DuplicateText<CR>g@
vnoremap <silent> <Leader>u :<C-u>call DuplicateText('vis')<CR>
fun DuplicateText(type) abort
    let marks = a:type ==? 'vis' ? '<>' : '[]'
    let [_, l1, c1, _] = getpos("'" . marks[0])
    let [_, l2, c2, _] = getpos("'" . marks[1])

    if l1 == l2
        let text = getline(l1)
        call setline(l1, text[:c2 - 1] . text[c1 - 1:c2] . text[c2 + 1:])
        call cursor(l2, c2 + 1)
        if a:type ==? 'vis'
            exe 'normal! v' . (c2 - c1) . 'l'
        endif

    else
        call append(l2, getline(l1, l2))
        call cursor(l2 + 1, c1)
        if a:type ==? 'vis'
            exe 'normal! V' . (l2 - l1) . 'j'
        endif

    endif
endfun

Now, what used to be yap}p has become ,uap. That’s just one key reduced but a reduction in keys is not what I’m aiming at here. It’s cognitive load of “duplicate this text” over “copy this text, go to end of text, paste text”. This works in visual mode as well, though I don’t use it as often.

Additionally, if triggered in visual mode, the duplicated text is selected again in visual mode. This quickly highlights the newly inserted text, so I can continue with operating on the duplicated text.

Now, if you’re aware of the :t (or :copy) command, then what I’m doing above may seem pointlessly elaborate. To an extent, I agree. In fact, I’m using the :t command for the ,uu mapping which is for duplicating a single line. The difference is that where :t only works line-wise, my implementation above can work character wise as well as line wise. For example, ,uaw (or just ,uw) will duplicate a single word, just like ,uap will duplicate a paragraph.

Transpose¶

This is another mapping I created to help me with CSV files. Specifically, this one works with tab-separated files, which are even more awesome to edit in Vim, thanks to the vartabstop option. The next section describes how I use this when editing tab separated files.

This mapping, when applied over lines with tab separated values, will transpose the matrix made of lines and tabs. Check out the GIF below to get a better understanding of how this works.

" Transpose tab separated values in selection or over motion.
nnoremap <silent> gt :set opfunc=Transpose<CR>g@
vnoremap <silent> gt :<C-u>call Transpose(1)<CR>
fun Transpose(...) abort
    let vis = get(a:000, 0, 0)
    let marks = vis ? '<>' : '[]'
    let [_, l1, c1, _] = getpos("'" . marks[0])
    let [_, l2, c2, _] = getpos("'" . marks[1])
    let l:lines = map(getline(l1, l2), 'split(v:val, "\t")')
    py3 <<EOPYTHON
import vim
from itertools import zip_longest
out = list(zip_longest(*vim.eval('l:lines'), fillvalue=''))
EOPYTHON
    let out = map(py3eval('out'), 'join(v:val, "\t")')
    call append(l2, out)
    exe l1 . ',' . l2 . 'delete _'
endfun

Needs +python3.

The keys I’m hitting in the GIF is gtip. I’m transposing the lines in the inner paragraph.

Note that I’m using :py3 for this, so, +python3 would be required for this to work. I might port it to Vimscript one of these days, hopefully.

Using `vartabstop` to Line Up¶

The moment I learnt about the vartabstop option, I jumped on it right away, considering I worked with tab separated files a lot. I created the following command that would scan the file’s contents and set the value of this option such that all the columns would line up perfectly, almost like a spreadsheet.

The vartabstop option is not available in Neovim, which is one of the reasons I don’t use it yet. I just got too used to vartabstop.

command TabsLineUp call <SID>TabsLineUp()
fun s:TabsLineUp() abort
    py3 <<EOPYTHON
import vim
lengths = []
for parts in (l.split('\t') for l in vim.current.buffer if '\t' in l):
    lengths.append([len(c) for c in parts])
vim.current.buffer.options['vartabstop'] = ','.join(str(max(ls) + 3) for ls in zip(*lengths))
EOPYTHON
endfun

Needs +python3.

Here’s a nice GIF showing this off! Note that although it looks like we’re just adding a lot of white space to align stuff, no new space characters are inserted. The document remains unchanged. It’s just the display size of tab characters is what we’re changing with vartabstop.

Finally, tab separated files are easier to deal with than comma separated files.

Also, if you’re into CSV and tab separated files, I recommend checking out the amazing csv.vim plugin. It makes similar use of the vartabstop option.

Strip Trailing Spaces¶

I know trailing whitespace doesn’t bother a lot of people much, but it does upset me. Most of the solutions I found online to remove trailing whitespace operate on the whole file. I wanted it to work with the lines over a motion, like inner paragraph etc. Of course, I could just visually select the text object and then do a :s/\s\+$//, but that’s too much effort!

" Strip all trailing spaces in the selection, or over motion.
nnoremap <silent> <Leader>x :set opfunc=StripRight<CR>g@
vnoremap <silent> <Leader>x :<C-u>call StripRight(1)<CR>
fun StripRight(...) abort
    let cp = getcurpos()
    let marks = get(a:000, 0, 0) ? '<>' : '[]'
    let [_, l1, c1, _] = getpos("'" . marks[0])
    let [_, l2, c2, _] = getpos("'" . marks[1])
    exe 'keepjumps ' . l1 . ',' . l2 . 's/\s\+$//e'
    call setpos('.', cp)
endfun

The above snippet defines a mapping, ,x which operates on a motion and removes trailing whitespace. There’s some nice additions to this, in that it works in visual mode as well, and that the cursor doesn’t move as a result of this operation.

Removing trailing whitespace inside current paragraph is now ,xip!

Append character over motion¶

This mapping lets me add a character at the end of all lines over a motion. So, like, ga;ip would add a semicolon to every line inside the paragraph.

I use this mostly to add commas or tab characters when working with CSV (or tab-separated files).

" Append a letter to all lines in motion.
nnoremap <silent> <expr> ga <SID>AppendToLines('n')
xnoremap <silent> ga :<C-u>call <SID>AppendToLines(visualmode())<CR>

fun s:AppendToLines(mode) abort
    let c = getchar()
    while c == "\<CursorHold>" | let c = getchar() | endwhile
    let g:_append_to_lines = nr2char(c)
    if a:mode ==? 'n'
        exe 'set opfunc=' . s:SID() . 'AppendToLinesOpFunc'
        return 'g@'
    else
        call s:AppendToLinesOpFunc('v')
    endif
endfun

fun s:AppendToLinesOpFunc(type) abort
    let marks = a:type ==? 'v' ? '<>' : '[]'
    for l in range(line("'" . marks[0]), line("'" . marks[1]))
        call setline(l, getline(l) . g:_append_to_lines)
    endfor
    unlet g:_append_to_lines
endfun

This may seem pointless in that, it’s not very hard to do this with visual block mode. Sure. On that note, even A is pretty pointless, it can be done with just $a, right? No. The point here is not about having a shorter key sequence to do this, but a more semantic one. Just like A spells “append at end of line”, to me, ga;ip spells “adding semicolon to every line in the paragraph”. Personally, I think better this way.

Conclusion¶

Text objects in Vim (and motions, for the most part) have effectively solved the problem of being able expressively select a piece of text to work on. However, in my opinion, the kind of work that can be done on such text is equally (if not more) important. Try to identify what you often do after selecting text with text objects and see if you can turn it into an operator mapping like those in this write-up.

This one is shorter than usual and that’s not because of lack of content, it’s more because of terrible planning on my part. Nevertheless, stay tuned for more in this series!

Read the previous article in this series.

The Weird `global`

2020-03-08T00:00:00+05:30

Python’s global keyword allows us to change the value of module-level variables inside functions. Sounds so simple and useful, doesn’t it? Well, yeah. I’m going to show you how it can be useful in the simple sense and situations where it can drive people nuts.

Simple Usage¶

Consider the following top.py script. We have a single module-level (aka global) variable here, and we change its value in the function done.

top.py

are_we_done = False


def mark_done():
    global are_we_done
    are_we_done = True


print("Done?", are_we_done)
mark_done()
print("Done?", are_we_done)

Running this, we get the following output:

Done? False
Done? True

The reason we were able to change the value of the global variable are_we_done from inside the mark_done function is because we declared it as such on line 5. If that declaration isn’t there, we’d just be defining a new function level variable called are_we_done inside the mark_done function. Which is not what we wanted.

Refer Directly¶

Note that declaring variables as global is needed only when we’re modifying the value of the variable. That means if we are only accessing the variable, we don’t need to declare it as global. This is how capitalized constant variables work in most Python scripts:

CURRENT_PLANET = "Earth"


def get_moon_count():
    if CURRENT_PLANET == "Earth":
        return 1
    else:
        raise ValueError("No idea!")


print(get_moon_count())

This, of course, prints out 1. Here, we are using the CURRENT_PLANET global variable inside the function without declaring it as global. Accessing doesn’t require explicitly declaring as global.

Modifying the Referred Object¶

A small note on the terms we’ve been using here. Accessing doesn’t require global declaration, but modifying does. Now look at the following code snippet:

CALLS = []


def record_call(phone_number):
    CALLS.append(phone_number)


record_call("123-45-678")
record_call("987-65-432")
print(CALLS)

Here, since we are appending to the CALLS list, is that considered modifying the global variable? The answer is no. We are merely accessing the CALLS variable’s value, which happens to be a list, on which we call the .append method. There’s no modifying going on here so far. The .append method, however, will change the state of the list object. But for the purposes of using the CALLS variable here, we are only accessing it. So, we don’t need to declare it as global.

So what does modifying mean? Simply put, if you want to reassign a global variable, it’s considered as modifying.

Assigning without Declaring¶

This behaviour of global variables causes some slightly unintuitive situations. For example, consider the following piece of code:

is_server_up = False


def mark_server_up():
    print(is_server_up)


mark_server_up()

In this script, we are using the global variable is_server_up on line 5, without declaring it as global, and it works fine. Now, we add another line to this function:

is_server_up = False


def mark_server_up():
    print(is_server_up)
    is_server_up = True


mark_server_up()

If we run this script, we get the following error:

Traceback (most recent call last):
  File "/check.py", line 9, in <module>
    mark_server_up()
  File "/check.py", line 5, in mark_server_up
    print(is_server_up)
UnboundLocalError: local variable 'is_server_up' referenced before assignment

Okay, we kind of expected an error because we are trying to modify a global variable without declaring it. But note that the error comes from line 5, not on line 6, where we are modifying the variable. The error message gives a hint on what’s happening.

local variable 'is_server_up' referenced before assignment

Since we didn’t declare is_server_up as global, and since we are setting a value to is_server_up, Python decided that we want a local variable in our function with the same name. With that understanding, it looks like we are referencing the is_server_up local variable before assigning a value to it. That’s the error we see here.

Conclusion¶

Global variables have their place, but, if it’s not for constant-like values, I’d recommend against using global variables at all. It might make sense for small one-off scripts, and when it does, keep the above small details in mind.

The Magic of AutoHotkey

2020-03-01T00:00:00+05:30

For the past several years, my primary work station has been Windows 7. After the initial swearing at how things work differently (coming from Linux), I got used to it and started to really like it, in some ways. A big part of the reason for that on Windows is AutoHotkey.

I will document my automations and experiences over the years in this two-part article series.

Part 1 (this article).
Part 2

Table of Contents

The Setup
The Common Magic
Close on Escape Key
The Caps Lock Story
Inserting Snippets
Window Watcher
Mess with Images in Clipboard
Periodic Time Display
Vim Keys for Sumatra PDF
Conclusion

AutoHotkey is an open-source programming language for Windows, that lends itself extremely well to tasks within the GUI scripting and automation domains. The hotkey functionality is particularly good, something I haven’t found in any other general purpose programming language (AutoIt most likely comes close, but I’ve never tried it so can’t speak for it).

The language itself may seem a bit flaky around the syntax and some of its constructs, but once we get used to them, we can leverage the powerful engine underneath it. That, combined with the well-written documentation, makes AutoHotkey a must-have tool for any Windows power user.

Some hotkeys I use (few that I can’t show off here) are so well integrated into my daily workflow, that my fingers just flow on the keyboard and things happen on screen that are hard to follow for others.

Any sufficiently advanced technology is indistinguishable from magic.

– Clarke’s Third Law, 1973

In these articles, I’ll share some of the hotkeys I use, how I came about them and how they improve my workflow. It is not a beginner’s AutoHotkey tutorial, that would be the official documentation or the many other resources available online.

A lot of the stuff in this article is made possible by a lot of help from all over the internet, and especially the AutoHotkey forums. Due to most of it being at least a few years old, I don’t have the exact source links. So, thank you everyone!

The Setup¶

I usually have one AutoHotkey script running at all times (called master.ahk). I #Include other scripts into this so that all my hotkeys and automations aren’t just dumped into one large master.ahk. It starts off with the following:

#NoEnv
#SingleInstance force
#Warn

SendMode Input
SetWorkingDir %A_ScriptDir% ; Default in autohotkey v2.
AutoTrim, Off ; Default in autohotkey v2.
SetTitleMatchMode RegEx
SetNumlockState, AlwaysOn

EnvGet, homedir, USERPROFILE

Most of this I learned to be a best practice from the documentation and from the forums. Please look up the documentation for these individual directives, I won’t repeat them here.

The Common Magic¶

These are essentials that are general enough that I believe everyone using AutoHotkey should have. Let’s quickly run these down, so we can move on to more exciting stuff.

Reload AutoHotkey Script¶

The script master.ahk that is running in the background at all times contains some of my hotkeys and the rest are #Include-ed from other AutoHotkey scripts. I include the below snippet in this script and when I hit #+r, the changes in master.ahk and any included scripts are reloaded.

#+r::Reload

All script snippets discussed here, if and when added to your master script, would start working fine with a Reload like above. No need to quit it and start again.

It’s really sad that there’s no default hotkey to have a calendar pop open on Windows. Clicking on the time displayed at the right of the toolbar does show a handy calendar, but there’s no hotkey for it. The following solves this exact problem. We use the #b hotkey which gives focus to the system tray. Then we navigate to the time and hit the {Enter} key.

#i::Send #b{Left}{Enter}

There’s a problem with this though. Once the calendar opens up, and we close it by hitting the Escape key, the focus is not returned to the window that had focus originally. The workaround for me has been to do Alt+Tab a couple of times, and we’re back to work.

It’s still arguable how useful this solution is. The pop-up Calendar has very limited functionality. The most annoying this is probably that I spend a few seconds selecting the month I want to look at and accidentally click on another window and that Calendar is gone! After a lot of swearing, I attempted to solve this problem and built justacalendar.app. It’s super-quick, no-login-required, light-weight, just a calendar to look at, and mark dates to top. Do check it out! Thanks.

Hide the Show Desktop Button¶

Every time my mouse moves to the bottom right corner, all my windows go transparent, and almost reduce me to swearing again. Now, I know we can turn this behaviour off by disabling Aero or some other setting and I can even agree that this feature can be useful. But to me, firstly, I hardly keep anything on my Desktop, so its mere existence is quite useless to me. Secondly, even if I wanted to look at the desktop, it’s a quick #d away, which is much faster considering my fingers are almost always on the keyboard.

So I decided to hide the “Show Desktop” button with the following snippet:

Control, Hide, , TrayShowDesktopButtonWClass1
    , ahk_class Shell_TrayWnd ahk_exe explorer.exe

This doesn’t reclaim the space occupied by the button, but the button disappears and the above problem goes away, so, I’m not complaining.

Type Clipboard Contents¶

Remember how some websites (especially bank websites) disallow pasting values into inputs. This is extremely annoying when using a password manager or when I want to just paste something. I’ve tried several solutions to this, and the current answer I have with AutoHotkey has served me the best.

#v::SendInput, {Raw}%Clipboard%

The idea is that instead of sending a paste operation, we have AutoHotkey type out the contents of the clipboard. This has the additional benefit of stripping any formatting in the text in the clipboard, if for instance, we’ve copied something from a website or a Word document with heavy formatting.

Close on Escape Key¶

There are some windows that I’d love to close with just a tap on the Escape key, but they don’t. A few examples of where I (instinctively) expect this are the photo viewer, font viewer, the playlist in VLC etc. Then there’s another set of windows that I found myself trying to close by hitting ^w (this intuition likely comes from its behaviour in Firefox and Chrome). Either way, I needed these keys to act the way I was expecting them to.

There’s two parts to the solution to this. First, we define the hotkeys to close the windows on window groups:

#IfWinActive ahk_group CloseOnEsc
Escape::PostMessage 0x112, 0xF060
#IfWinActive ahk_group CloseOnCW
^w::PostMessage 0x112, 0xF060
#IfWinActive

What we’re doing here is define two hotkeys. First, for all windows in the group called CloseOnEsc, define the hotkey Escape to close the window (the PostMessage part, which we’ll get to in a bit). Second, a similar hotkey on ^w for windows in the group CloseOnCW.

Now, you might’ve noticed that we don’t use the WinClose command to close the window. The reason is that for some applications (such as Lync), the WinClose command quits the application instead of just sending it back to the tray. The PostMessage command above would behave exactly like hitting the red close button at the top right of the window.

In the second part of this exercise, we add windows to the groups:

; Windows that should just disappear on ESC, but don't already.
GroupAdd, CloseOnEsc, ahk_class Photo_Lightweight_Viewer
GroupAdd, CloseOnEsc, ahk_class ConsoleWindowClass
GroupAdd, CloseOnEsc, Skype for Business
GroupAdd, CloseOnEsc, Vivaldi Settings ahk_exe vivaldi.exe
GroupAdd, CloseOnEsc, ahk_class FontViewWClass ahk_exe fontview.exe
GroupAdd, CloseOnEsc, Playlist ahk_exe vlc.exe

; Windows that should close with C-w.
GroupAdd, CloseOnCW, ahk_class Notepad ahk_exe notepad.exe
GroupAdd, CloseOnCW, ahk_class FM ahk_exe 7zFM.exe

This should be fairly self-explanatory. We add certain windows (as identified by WinTitle style filters) and add them to the two groups, using the GroupAdd command.

There’s one special case here. The stock Windows Calculator app. This one clears the display on hitting Escape key. But I wanted it to close on Escape if the display is already cleared.

So, instead of including Calculator in the above group(s), I use the following snippet to handle this special case.

#IfWinActive ahk_class CalcFrame
$Escape::
CloseOrClearCalculator() {
    ControlGetText, display, Static4
    if (display == "0")
        WinClose
    else
        SendInput, {Escape}
}
#IfWinActive

This will close the Calculator if the display is already "0", but passes the Escape key otherwise.

The Caps Lock Story¶

I use SharpKeys to turn my Caps Lock key into an additional Ctrl Key. This works wonders considering that the Ctrl key is used a lot more often than the Caps Lock, but the Caps Lock key is a lot easier to hit than any of the Ctrl keys.

If you’re wondering why I don’t do this with AutoHotkey, the reason is that if I did it with AutoHotkey, it would be active only when the script is running. Which means the remapping isn’t active in the lock screen (where I hit Ctrl+A often). But since SharpKeys modifies the registry to achieve what it does, the remapping works even in the lock screen.

Yet, sometimes I miss the original functionality of the Caps Lock key. So I created the following hotkey for #q which will turn on Caps Lock mode, and show an annoying always-on-top splash window alerting me to that fact. To turn it back off, it’s #q again.

#q::
ToggleCapsLock() {
    if GetKeyState("Capslock", "T") {
        SetCapsLockState, Off
        SplashTextOff
    } else {
        SetCapsLockState, On
        SplashTextOn, 300, , << CAPS LOCK ON >> (Win+q to turn off)
        WinSet, Transparent, 200, << CAPS LOCK ON >>
    }
}

This actually works surprisingly well. I use it more often than I like to admit. It feels better than using the original Caps Lock key, because I get an (hard-to-ignore) overlay that alerts me that Caps Lock is turned on.

Inserting Snippets¶

Inserting snippets is an idea where a long and often used string is inserted by a rather short sequence of keys. In AutoHotkey, this is usually done using hotstrings. Hotstrings work okay for this actually, but they don’t work on every application. For me particularly, I needed them to be working with GVim (which is where I write most of my prose), which they weren’t. So, with a lot of help from the Internet, I came up with a solution.

Instead of hotstrings, I’ll use a hotkey that summons an OSD (on-screen-display) with a list of keys and their expansions. When this window is focused, I can hit one of those keys and the windows is immediately closed and the corresponding expansion is typed out. This has been working unchanged for over four years for me and has never failed me.

Show remaining 9 lines

snippets.ahk

SnippetsInit() {
    Gui, Snips: Default
    Gui, Font, s18 q5, Consolas
    Gui, Color, FF0000
    Gui, Margin, 6, 6
    Gui, +AlwaysOnTop +Owner +ToolWindow -Caption +HwndSnippetsHwnd
    Gui, Add, ListView, r8 w900, Hotkey|Text

    IniRead, configText, snippets.ini, master
    Loop, Parse, configText, `n, `r
    {
        parts := StrSplit(A_LoopField, "=", " `t")
        LV_Add("", parts[1], parts[2])
    }
}

SnippetsShow() {
    global SnippetsMap
    Gui, Snips: Show, NoActivate
    Input, key, L1 T3
    Gui, Snips: Hide
    if (ErrorLevel != "Timeout") {
        IniRead, value, snippets.ini, master, %key%, __SNIPPETS_KEY_NOT_FOUND__
        if (value != "__SNIPPETS_KEY_NOT_FOUND__")
            SendInput, %value%
        else
            MsgBox, No snippet found for %key%.
    }
}

I have the above in a module called snippets.ahk, which I include in my master script. To use, first, I need a snippets.ini file in the same directory with expansions. I have things like the following:

[master]
u = sharat87
m = yeahhereismyaddress@gmail.com
i = {+}91 AND MY PHONE NUMBER
s = https://sharats.me/

There’s more snippets on my system, this is just a preview, of course, duh!

The next step is to include this module in our master script:

#Include snippets.ahk
SnippetsInit()

Finally, we define a hotkey to summon the snippets window. I use ^;.

^;::SnippetsShow()

That’s it! Here it is in action:

Window Watcher¶

My window watcher module (written as a window-watcher.ahk) lets me define actions to be taken when new windows with certain properties show up.

For example, I want all command line windows to always be moved to the top right corner or the screen. As another example, there’s some windows that open up with a window size equal to the whole screen, but are not maximized. This one is particularly annoying since I have a habit of throwing my mouse to the top right corner and clicking to close the window. But since this window is not maximized, I end up accidentally closing the window behind. So, I want such windows to be maximized as soon as they open.

To address this, I have a window-watcher.ahk module that defines the logic of constantly polling the visible windows and detecting if anything is opened or closed. This module defines the function WindowWatcherInit (among others), which needs to be called once to initialize the polling timer.

Show remaining 17 lines

window-watchers.ahk

WindowWatcherInit() {
    static initDone := false

    if (initDone)
        return
    initDone := true

    SetTimer, WindowWatcherPollForNewWindows
}

WindowWatcherTrigger(wParam, hwnd) {
    if (wParam == "Created") {
        OnWindowCreated(hwnd)
    ; } else if (wParam == "Destroyed") {
    }
}

WindowWatcherPollForNewWindows() {
    static windows := ""
    WinGet, wins, List, , , ,
    newWindows := Object()

    Loop, %wins%
    {
        this_id := wins%A_Index%
        newWindows[this_id] := 1
        if (windows && !windows[this_id])
            WindowWatcherTrigger("Created", this_id)
    }

    for wid, p in windows {
        if (!newWindows[wid])
            WindowWatcherTrigger("Destroyed", wid)
    }

    windows := newWindows
}

From then on, any time a new window is detected, the OnWindowCreated function is called with the new window’s hwnd passed as the only argument. In this function, I match this window ID with various types of windows and take the action I need. Here’s a short preview of that function (in reality, the function is 81 lines long in my master script).

Show remaining 9 lines

OnWindowCreated(hwnd) {
    global homedir

    ; Close "Illegal IP Address" alerts.
    } else if (WinExist("Application Error ahk_exe jweblauncher.exe ahk_id " . hwnd)) {
        PostMessage, 0x112, 0xF060, , ahk_id %hwnd%

    ; Close "Kyeboard History Utility" alerts.
    } else if (WinExist("Keyboard History Utility ahk_exe WerFault.exe ahk_id " . hwnd)) {
        ControlClick, Close the program, ahk_id %hwnd%

    ; When a command window opens, move it to top-right.
    } else if (WinExist("ahk_class ConsoleWindowClass ahk_id " . hwnd)) {
        WinGetPos, , , w, , ahk_id %hwnd%
        x := A_ScreenWidth - w
        WinMove, ahk_id %hwnd%, , %x%, 0

    ; Maximize windows that open unmaximized but occupy almost-entire screen.
    } else if (WinExist("ahk_id " . hwnd . " ahk_group MaximizeOnOpen")) {
        WinMaximize, ahk_id %hwnd%

    } else {
        WinGetPos, , , width, height, ahk_id %hwnd%
        if (width >= A_ScreenWidth && height > .9 * A_ScreenHeight)
            WinMaximize, ahk_id %hwnd%

    }

}

There are other methods to achieve the window-watching without polling and I encourage you to try them out if you’re not comfortable with this polling system, like with using RegisterShellHookWindow. In my experience, such solutions seemed to miss some windows and were able to catch only a small limited set of the windows there were opening. So I went with polling, which was less efficient, but has been more reliable for me.

Mess with Images in Clipboard¶

This is a little trick that’s powered by ImageMagick. I add a menu item in the tray icon’s context menu called "Add border to image in clipboard", which is quite self-explanatory!

Menu, Tray, Add, Add border to image in clipboard, AddBorderToImageInCb

The callback for this menu item invokes the following function. Here, we just run the appropriate ImageMagick command and show a little dialog when it’s done, so we can go ahead and paste the bordered image.

AddBorderToImageInCb() {
    RunWait, C:\tools\ImageMagick\magick.exe convert clipboard:myimage -bordercolor "#0099FF" -border 6x6 clipboard:, , Hide
    MsgBox, Added border to image in clipboard.
}

I use this a lot with screenshot snips (taken with the snipping tool or copied from paint), before pasting into an email. Having a border around images in emails makes them stand out and have a distinct visual.

Periodic Time Display¶

As an alternative to the popular Pomodoro Technique, I have a small non-intrusive OSD show up with the current time at the bottom of my screen every 20 minutes. That is, I get a small blue OSD at :00 times, a small green OSD at :20 times and a small orange OSD at :40 times. Here’s preview of how this looks:

Again, for this, I have a separate module called time-osd.ahk which I #Include in the master script and call its init function. (This init-function-in-a-separate-module is something I came up with that was working well enough, I have no idea if it’s a best practice).

Show remaining 23 lines

time-osd.ahk

TimeOSDInit() {
    global TimeOSDLabel
    SetTimer, TimeOSDPulse, 1000
    Gui, TimeOSD:Default
    Gui, +LastFound +AlwaysOnTop +ToolWindow -Caption
    Gui, Font, s18, Calibri
    Gui, Margin, 0, 0
    Gui, Add, Text, cWhite vTimeOSDLabel gTimeOSDClose w250 h36 Center
}

TimeOSDPulse() {
    static lastTime := ""

    if (IsFunc("IsWindowFullScreen") && IsWindowFullScreen("A"))
        Return

    FormatTime, currTime, , h:mm tt

    if (lastTime == currTime || A_TimeIdlePhysical > 600000)
        Return

    if (RegExMatch(currTime, ":00"))
        TimeOSDShow(currTime, "268BD2")
    else if (RegExMatch(currTime, ":20"))
        TimeOSDShow(currTime, "859900")
    else if (RegExMatch(currTime, ":40"))
        TimeOSDShow(currTime, "CB4B16")

    lastTime := currTime
}

TimeOSDShow(timeText, bg) {
    Gui, TimeOSD:Default
    Gui, Color, %bg%
    GuiControl, Text, TimeOSDLabel, It's %timeText% already!
    y := A_ScreenHeight - 120
    Gui, Show, xCenter y%y% NoActivate
    SetTimer, TimeOSDClose, -10000
}

TimeOSDClose() {
    Gui, TimeOSD:Cancel
}

With this, clicking on the OSDs will close them, or, they’ll disappear in 10 seconds.

To use this, I just include the following in my master script.

#Include time-osd.ahk
TimeOSDInit()

Vim Keys for Sumatra PDF¶

This one probably only makes sense if your fingers are used to hitting Vim’s hotkeys. I wanted some of Vim’s simple hotkeys for navigating the document on Sumatra PDF (my PDF reader of choice on Windows). The following snippet that I currently use, gets me d (like Vim’s <C-d>), e (like Vim’s <C-u>), n, +n (like Vim’s N), x (to close a tab), g and +g (like Vim’s g & G).

Show remaining 23 lines

#IfWinActive ahk_exe SumatraPDF.exe ahk_class SUMATRA_PDF_FRAME
$d::
$e::
    SumatraKeys := {d: "j", e: "k"}
    ControlGetFocus, ctrl
    if (ctrl == "Edit1" or ctrl == "Edit2") {
        Send %A_ThisHotkey%
    } else {
        k := SumatraKeys[StrReplace(A_ThisHotkey, "$", "")]
        Send {%k% 22}
    }
    Return

$n::
    ControlGetFocus, ctrl
    if (ctrl == "Edit1" or ctrl == "Edit2")
        Send, n
    else
        Send, {F3}
    Return

$+n::
    ControlGetFocus, ctrl
    if (ctrl == "Edit1" or ctrl == "Edit2")
        Send, N
    else
        Send, +{F3}
    Return

$x::
    ControlGetFocus, ctrl
    if (ctrl == "Edit1" or ctrl == "Edit2")
        Send, x
    else
        Send, ^w
    Return

+g::Send, {End 2}

#IfWinActive Go to page ahk_exe SumatraPDF.exe ahk_class #32770
g::Send, {Escape}{Home}

#IfWinActive

This looks like a sad, long, hairy piece of code (probably because it is), but it works, so I let it be. This sentiment shows up a lot when dealing with AutoHotkey code. But it works, and it works really well.

Conclusion¶

AutoHotkey’s language may have its quirks, but it’s a very powerful tool when it comes to hotkeys. I have come to the point that working on Windows is practically hair-wrecking for me without AutoHotkey (and my scripts, of course). I encourage you to check it out and explore the possibilities.

You can read the Part 2 of this now!

Guide to Comprehensions in Python

2020-02-23T00:00:00+05:30

Comprehensions are a syntax construct used for applying some form of transformations and filtering over streams of data. The problems comprehensions solve can be done without them, using plain old for-loops, but where possible, comprehensions can improve readability and show the intent very well.

This article assumes some familiarity with Python (and comprehensions as well). I will go over the basics of comprehensions quickly and jump into the meat of the article. Most of this article applies for Python 3, unless otherwise specified.

If you’re here for the live converter or comprehension ⇔ for-loop code, it’s further down in the page.

Table of Contents

Basic Syntax
Different Collectors
Multiple Looping Constructs
- Zipping instead of Cross Product
Rewriting Comprehensions map & filter Builtins
Reducing with Assignment Expressions
Set Operations with Comprehensions
Generator Expressions
The key Argument for sorted
No Side Effects Please
Looking Inside
Live Code Converter
Conclusion

Basic Syntax¶

Let’s go over the basic syntax for starters. It can be divided into three parts. The result expression, the looping construct(s) and the filter expression. Of these, the filter expression is optional, but the other two are required. Let’s look at a simple example to get an idea:

>>> [n ** 2 for n in range(4)]
[0, 1, 4, 9]

This is a list comprehension with no filtering (i.e., no if clause). Here, the n ** 2 part is the result expression and the for n in range(4) is the looping construct. This comprehension expression is the same as the following piece of code, written without comprehensions:

>>> squares = []
>>> for n in range(4):
...     squares.append(n ** 2)
...
>>> squares
[0, 1, 4, 9]

Comprehensions also support conditions on the looping variables. For instance, in the example above, if we only wanted squares of even numbers, we could do:

>>> [n ** 2 for n in range(4) if n % 2 == 0]
[0, 4]

In this case, the result expression is not evaluated when the n % 2 == 0 turns out to be False.

The keen Pythonista might note that this can be accomplished more simply by using the step argument of the range builtin, but please excuse me for lacking in creativity for the examples!

Different Collectors¶

In addition to list comprehensions, Python supports set and dict comprehensions as well. Where list comprehensions collect the result values in a list, the latter two collect them in sets and dicts respectively.

The syntax is almost exactly same as that of the list comprehensions. The only difference is that we use braces for set and dict comprehensions, where we use square brackets for list comprehensions. The looping and filtering constructs behave the same way. The result expression behaves the same way for set comprehensions, but for dict comprehensions, we have to provide two expressions, the key and the value, separate by a colon. Let’s look at some examples:

>>> [color.lower() for color in ['Blue', 'Red', 'blue', 'yellow']]
['blue', 'red', 'blue', 'yellow']

>>> {color.lower() for color in ['Blue', 'Red', 'blue', 'yellow']}
{'blue', 'red', 'yellow'}

The first expression in the above REPL session is a list comprehension and the second is a set comprehension. Notice that the only difference in the first and third lines is the surrounding bracket type.

>>> {color.lower(): len(color) for color in ['Blue', 'Red', 'blue', 'yellow']}
{'blue': 4, 'red': 3, 'yellow': 6}

This is a dictionary comprehension. Notice here, the result expression is a key-value pair of expressions, as opposed to a single expression for list and set comprehensions.

Note that these two forms of comprehensions have been introduced in Python 2.7 & 3. In the previous versions, we could replicate this by calling the set and dict builtins over list comprehensions. Here’s an example:

>>> set([color.lower() for color in ['Blue', 'Red', 'blue', 'yellow']])
{'blue', 'red', 'yellow'}

>>> dict([(color.lower(), len(color)) for color in ['Blue', 'Red', 'blue', 'yellow']])
{'blue': 4, 'red': 3, 'yellow': 6}

For dictionaries, we create a list of 2-tuples (key-value pairs) and pass that to dict.

Multiple Looping Constructs¶

In the previous examples, we’ve only used one looping construct. However, it is possible to use more than one looping construct. This works very similar to a nested for-loop. Let’s look at an example:

>>> [(i, j) for i in range(0, 3) for j in range(10, 13)]
[(0, 10), (0, 11), (0, 12), (1, 10), (1, 11), (1, 12), (2, 10), (2, 11), (2, 12)]

This output is easy to visualize if you see the two for-loops nested. The following is a reproduction of the above, without comprehensions:

>>> result = []
>>> for i in range(0, 3):
...     for j in range(10, 13):
...         result.append((i, j))
...
>>> result
[(0, 10), (0, 11), (0, 12), (1, 10), (1, 11), (1, 12), (2, 10), (2, 11), (2, 12)]

This can go further levels of nesting, although if you have comprehensions with more three levels of nesting, you should probably rethink your data structures or the way you’re working with them.

Multiple looping constructs work just fine for set and dict comprehensions as well. Here’s some examples with set comprehensions and using a condition expression as well:

>>> {(i, j) for i in range(0, 3) for j in range(10, 13)}
{(1, 12), (2, 11), (0, 12), (2, 10), (0, 11), (0, 10), (2, 12), (1, 10), (1, 11)}

>>> {(i, j) for i in range(0, 3) for j in range(10, 13) if j - i > 10}
{(0, 11), (1, 12), (0, 12)}

A subtle point here that’s not easy to notice in the comprehensions is that the range(10, 13) call in the above examples is called three times, whereas the range(0, 3) is called once. This becomes obvious if you visualize this as the nested for-loop illustrated above. This is important when using generators or iterators that work single-pass, like map objects, or file objects (for which, we’ll need .seek). Check out the following example to see what I mean:

>>> range_for_i = map(str, range(0, 3))
>>> range_for_j = map(str, range(10, 13))

>>> [(i, j) for i in range_for_i for j in range_for_j]
[('0', '10'), ('0', '11'), ('0', '12')]

In this example, the map objects are destroyed once they have yielded all their results. That is why the range_for_j only produced the three numbers only once, which were enough to pair with just '0', and there’s no more to be paired with '1' and '2'.

You’re not likely to encounter this in real-world code, but it’s good to know lest we end up facing it.

Zipping instead of Cross Product¶

Using multiple for loops like above creates a sort-of cross-product. This is by nature of the nested loop structure. But what if we’re looking for a sort-of dot-product like result? Python provides the zip builtin for this purpose. It is so specific to this problem, that using a comprehension looks like unnecessary ceremony:

>>> [(i, j) for i, j in zip(range(0, 3), range(10, 13))]
[(0, 10), (1, 11), (2, 12)]

>>> list(zip(range(0, 3), range(10, 13)))
[(0, 10), (1, 11), (2, 12)]

Of course, if we’re doing some operation with i and j instead of just creating tuples, the comprehension would still be very useful.

>>> [i * j for i, j in zip(range(0, 3), range(10, 13))]
[0, 11, 24]

Rewriting Comprehensions `map` & `filter` Builtins¶

Comprehensions can usually be a more-readable alternative to code written using map and/or filter functions.

I’ve discussed the map builtin in more detail in a previous article. Not all features of a comprehension can be translated with just the map function. In particular, there’s no way to apply a condition like we can in comprehensions, when using the map function alone. It can be done if we also make use of the filter builtin. Here’s an example of how such a comprehension can be rewritten with map and filter.

>>> [n ** 2 for n in range(10) if n % 2 == 0]
[0, 4, 16, 36, 64]

>>> list(map(lambda n: n ** 2, filter(lambda n: n % 2 == 0, range(10))))
[0, 4, 16, 36, 64]

Obviously, the comprehension reads much better, but I’d urge you to not just throw away the map and filter builtins. They have their place and sometimes, code using them can read much better than comprehensions. Check out my article on map function for such examples and other rationales.

Reducing with Assignment Expressions¶

I’ve actually stumbled on a version of this idea on Reddit. Unfortunately I don’t have the source, so, wherever you are, thank you!

The functools module from the standard library provides the reduce callable which can be used to systematically aggregate values in collections. I won’t go into details of how this can be used, but I will show how such an affect can be reproduced with comprehensions.

Let’s look at an example of using the functools.reduce:

>>> import functools
>>> functools.reduce(lambda acc, item: acc * item, range(1, 5), 1)
24

A simple implementation of the reduce function is provided at the official documentation and it’s a better explanation that I can provide here. Instead, we’ll try and reproduce this with comprehensions.

For this, we have to first familiarize ourselves with the walrus operator. This is a new feature in Python 3.8, that lets us do assignments in expressions. This means we’ll now be able to do assignment operations in places where only expressions (and not statements) are allowed, like the result expression spot in comprehensions.

By the power of the gray walrus, we can reproduce functools.reduce:

>>> acc = 1
>>> [acc := acc * item for item in range(1, 5)]
[1, 2, 6, 24]
>>> _[-1]
24

Although that works, and is quite nice, I’m not sure how readable that is. But I can attribute my discomfort to the fact that this is uses a new language feature and like anything in life, needs some getting used to. Also since it’s new in version 3.8, it’s probably best to stay away from it in production code for a little while.

Set Operations with Comprehensions¶

Comprehensions lend themselves quite well for set operations like intersection and difference. They’ll probably be less performant (and even less obvious to readers of such code), but nonetheless, it’s a nice example to play with:

>>> rgb_colors = {"red", "green", "blue"}
>>> ryb_colors = {"red", "yellow", "blue"}

>>> intersection = {c for c in rgb_colors if c in ryb_colors}
>>> intersection
{'red', 'blue'}

>>> difference = {c for c in rgb_colors if c not in ryb_colors}
>>> difference
{'green'}

These are the same results we’d get if we used the standard set operators / methods:

>>> rgb_colors & ryb_colors
{'red', 'blue'}

>>> rgb_colors - ryb_colors
{'green'}

Again, use the standard set functionalities for this, not the comprehension based methods I illustrated above. If you do use the comprehension method of doing this in production, don’t point to me or this article as inspiration.

Generator Expressions¶

When comprehensions are wrapped in square brackets or braces, the result is a fully realized collection, like a list or a set. However, when not wrapped as such, or when wrapped with just parentheses, the result is a generator expression, with none of result items realized. The result items are realized as needed, like for example, if it’s used in a for-loop.

Consider the following example session:

>>> [n ** 2 for n in range(4)]
[0, 1, 4, 9]

>>> n ** 2 for n in range(4)
<generator object <genexpr> at 0x0000000005768DC8>

We can use this generator object in a for-loop or, perhaps more typically, in an aggregation function, like sum or max etc.

>>> squares = n ** 2 for n in range(4)
>>> sum(squares)
14

Of course since this is a generator expression, it can be iterated over only once. If you want to iterate over it multiple times, just turn it into a list.

Generator expressions were introduced in PEP-289, which contains a lot of examples. I recommend reviewing it for some cool use cases, which I won’t reproduce here.

One small note regarding passing generator expressions as an argument to functions is that, make it a best practice to always wrap them with parentheses. The reason is, when using a generator expression as an argument to a function, and when it is not the only argument to the function, we may get an error that the generator expression is not parenthesized. Check out the following example if that doesn’t make sense:

In the following call to sorted, we pass in a generator expression as the sole argument, and we get the expected result.

>>> sorted(word.lower() for word in "We are from planet Earth, what's up?".split())
['are', 'earth,', 'from', 'planet', 'up?', 'we', "what's"]

Now to the same call, we add the key argument hoping to sort by the string lengths. Instead, we get a SyntaxError because our generator expression is not parenthesized.

>>> sorted(word.lower() for word in "We are from planet Earth, what's up?".split(), key=len)
  File "<stdin>", line 1
SyntaxError: Generator expression must be parenthesized

So, if we add parentheses to the generator, it works fine and we get the expected result.

>>> sorted((word.lower() for word in "We are from planet Earth, what's up?".split()), key=len)
['we', 'are', 'up?', 'from', 'planet', 'earth,', "what's"]

The `key` Argument for `sorted`¶

The sorted builtin provides the key argument that can be set to a function. This function is applied to each item in the given list and the list items are sorted according to the sorting order of the results of these function calls. This is a very convenient feature of sorted.

While this is probably a horrible thing to do, we could use comprehensions to recreate this effect without using the key argument. The idea is that we first create a sequence of 2-tuples, where the first items are the results of the key function and the second items are the original list items. We then sort this sequence of tuples, extract the second items in each tuple and return that. Here’s an example implementation doing just that:

def sad_sorted_with_key(items, key_fn):
    return [item for _, item in sorted((key_fn(item), item) for item in items)]


print(sad_sorted_with_key(
    (word.lower() for word in "We are from planet Earth, what's up?".split()),
    len,
))

This script would produce the following output:

['we', 'are', 'up?', 'from', 'earth,', 'planet', "what's"]

As usual, don’t do this in production. This is just a sad experiment.

No Side Effects Please¶

As best practice, please strive to have no side effects in your comprehension result expressions. Check out the following example to see what I mean:

>>> [print(n ** 2) for n in range(4)]
0
1
4
9
[None, None, None, None]

While this solves the purpose of printing the squares one per line, it also builds a list of Nones. It’s also counter-intuitive when we treat comprehensions as applying a transformation over each item in a collection. Calling print is not a transformation, it’s a side effect.

For use cases like this, it’s best to use a traditional for-loop:

>>> for n in range(4):
...     print(n ** 2)
0
1
4
9

The intent here is clearer, which is to print each square, not to make a list of some results.

Looking Inside¶

As another likely-pointless exercise, let’s look at these comprehensions as Python bytecode, and compare it with the same solution written using traditional for-loop.

First, let’s define two functions that solve the same problem, but one uses comprehensions, and the other doesn’t.

def loop_squares():
    result = []
    for n in range(4):
        result.append(n ** 2)
    return result


def comp_squares():
    return [n ** 2 for n in range(4)]

Let’s make sure they produce the same output:

>>> loop_squares()
[0, 1, 4, 9]
>>> comp_squares()
[0, 1, 4, 9]

Now let’s get the dis module and disassemble both of these functions:

Show remaining 28 lines

>>> import dis
>>> dis.dis(loop_squares)
  2           0 BUILD_LIST               0
              2 STORE_FAST               0 (result)

  3           4 SETUP_LOOP              30 (to 36)
              6 LOAD_GLOBAL              0 (range)
              8 LOAD_CONST               1 (4)
             10 CALL_FUNCTION            1
             12 GET_ITER
        >>   14 FOR_ITER                18 (to 34)
             16 STORE_FAST               1 (n)

  4          18 LOAD_FAST                0 (result)
             20 LOAD_METHOD              1 (append)
             22 LOAD_FAST                1 (n)
             24 LOAD_CONST               2 (2)
             26 BINARY_POWER
             28 CALL_METHOD              1
             30 POP_TOP
             32 JUMP_ABSOLUTE           14
        >>   34 POP_BLOCK

  5     >>   36 LOAD_FAST                0 (result)
             38 RETURN_VALUE

>>> dis.dis(comp_squares)
  2           0 LOAD_CONST               1 (<code object <listcomp> at 0x7f3958a76c00, file "<stdin>", line 2>)
              2 LOAD_CONST               2 ('comp_squares.<locals>.<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_GLOBAL              0 (range)
              8 LOAD_CONST               3 (4)
             10 CALL_FUNCTION            1
             12 GET_ITER
             14 CALL_FUNCTION            1
             16 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x7f3958a76c00, file "<stdin>", line 2>:
  2           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                12 (to 18)
              6 STORE_FAST               1 (n)
              8 LOAD_FAST                1 (n)
             10 LOAD_CONST               0 (2)
             12 BINARY_POWER
             14 LIST_APPEND              2
             16 JUMP_ABSOLUTE            4
        >>   18 RETURN_VALUE

I won’t discuss each instruction in the above outputs, check out the official documentation of the dis module for that. But just skimming over the above, we can see one striking difference. The comprehension function seems to have created a code object, which is doing the work of the comprehension and passing (returning) the result to our comp_squares function. That sounds like the comp_squares function is using an extra layer in the stack frame. We can confirm this by changing the functions to the following:

import traceback

def loop_squares():
    traceback.print_stack()
    result = []
    for n in range(4):
        result.append(n ** 2)
    return result


def comp_squares():
    return [[traceback.print_stack() if n == 0 else None, n ** 2][1] for n in range(4)]

Let’s see the stack they print and make sure they still produce the same result:

>>> loop_squares()
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in loop_squares
[0, 1, 4, 9]
>>> comp_squares()
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in comp_squares
  File "<stdin>", line 2, in <listcomp>
[0, 1, 4, 9]

The stack shows the file as "<stdin>" because I defined the functions within a REPL session. If they were in an actual file, we’d obviously get the file name there.

As we suspected, the comprehension function adds another layer to the stack frame, the <listcomp>, which is doing the work of the comprehension.

Live Code Converter¶

Here’s a little tool that converts your code written in the form of a list/set/dict comprehension, into one that is written using traditional for-loops.

It’s powered by an extremely light parser (doesn’t even qualify to be called that), but it can help illustrate the point. It can also be helpful for visualizing nested loops and comprehensions with multiple for statements.

Here’s some examples to try this with:

Comprehension Code (click to put in converter)
`[n ** 2 for n in range(4)]`
`[n ** 2 for n in range(4) if n % 2 == 0]`
`{n ** 2 for n in range(4) if n % 2 == 0}`
`[r"abc def" for n in range(4)]`
`[(1, 2) for n in range(4)]`
`[n * m for n in range(4) for m in range(3) if n % 2 == 0]`
`{n * m for n in range(4) for m in range(3) if n % 2 == 0}`
`{n: n ** 2 for n in range(4) if n % 2 == 0}`

Conclusion¶

Comprehensions are a powerful feature in Python that can create very readable code when used correctly. However, like everything else, they have a place and time and it’s not everywhere and all-the-time. It’s important to understand them well if you’re doing more than the trivial list comprehension.

Do check out the official documentation on List Comprehensions, which contains a lot of good examples and ideas I didn’t discuss here.

Additionally, at the expense of repeating the same thing, there’s some experiments on this page that are only intended for learning. Please do not use them in production code. Have pity on your future self.

Automating the Vim workplace — Chapter Ⅱ

2020-02-16T00:00:00+05:30

This is a follow-up of the Automate the Vim workplace article I published last month. As promised, here’s a follow up with more on how I identified and addressed things in Vim that could be improved to speed me up. Feel free to grab the ideas in this article or, better yet, take inspiration and inspect your workflow to identify such opportunities.

This article is part of a series:

Chapter Ⅰ.
Chapter Ⅱ (this article).
Chapter Ⅲ.

Table of Contents

Easier Alternative to :
Repeat Key Mappings
Ruler vs Status Line
Opening & Switching Buffers
Change CWD Smartly
Jumping over Paragraphs
Vertical Line Selection
Zoom When Presenting
Copy Lines as CSV
Conclusion

Please note that all that I share below is what I’m using with Vim (more specifically, GVim on Windows). I don’t use Neovim (yet) and I can’t speak for any of the below for Neovim.

Easier Alternative to `:`¶

Going to the command-line mode for entering Ex commands is used very often, yet requires the hitting of Shift and ; keys. This, while there’s a giant blank key right under my thumbs that has no unique & practical purpose in the normal mode, the Space key.

noremap <Space> :

This is likely my oldest mapping that survives even today. It’s also the one I miss the most when working with Vim on servers.

Another popular alternative for this mapping is the ; key. However, unlike the Space key, this one has a useful default functionality, which will be lost. (Look up :h ; to find out, I won’t repeat it here).

Note that we use noremap here, not nnoremap. So this works when in visual mode as well.

Repeat Key Mappings¶

There’s some mappings like dd, cc etc. that are made of two keys repeated sequentially. While the appear convenient, hitting them usually takes slightly longer than hitting two different keys in quick succession.

So, for all these type of bindings (and then some), I have a predictable alternative that:

" Maps that repeat a key can instead use the `.` key.
nnoremap d. dd
nnoremap y. yy
nnoremap c. cc
nnoremap g. gg
nnoremap v. V

These bindings are a lot more convenient once our fingers get used to them and we get used to the mnemonic of the . here.

Ruler vs Status Line¶

This is another topic that gets a lot of attention when one is setting up their Vim working environment. What with all the fancy status-line plugins in the wild, it is easy to get carried away.

My recommendation (nothing unique, has been said by better people before), is that you look at your working style first. How often do you make it a point to look at the status line while working? Now compare this to the fact that the status line costs you one line of vertical space. Measure for yourself if it’s worth it.

If your question is, but what’s the alternative? Where do I see stuff like the current line number, column number, file type, the git branch, wi-fi status of the coffee shop across the street etc. etc.? My answer is the same again, firstly, see what you need, identify what you’ll miss and narrow down to a minimal list of the stuff you need. Whatever you don’t need is most likely just a want and will end up being a distraction when you’re in deep thought (the worst kind of distraction). Secondly, we have the following other options.

One alternative is to use the ruler option. This is similar to the status line, although not quite as flexible. But don’t let that discourage you, for minimal information to be shown in the corner of your Vim, it’s plenty powerful. By default, it just shows the current cursor position, but can be configured to show anything with the rulerformat option. I won’t go into detail on how to configure them (may be in the future / others have done it better than I could).

First, turn on ruler.

set ruler

Next, I set rulerformat as a variable since it’s slightly easier this way when dealing with escape characters.

let &rulerformat = '%50(b%n %{&ff} %{&ft}' .
            \ '%( %{len(getqflist()) ? ("q" . len(getqflist())) : ""}%)' .
            \ '%( %{search("\\s$", "cnw", 0, 200) ? "∙$" : ""}%)' .
            \ '%( %{exists("b:stl_fn") ? call(b:stl_fn) : ""}%)' .
            \ '%= L%l,%c%V %P %*%)'

Each line in the above snippet is a little piece of information that I need to know at a glance. Here’s a run down:

Buffer number, 'fileformat' (indicates line endings), 'filetype'.
A count of items in the quickfix list.
An indicator for trailing whitespace in the current buffer.
A buffer specific function that may be called for additional input to be shown. I hardly use this currently.
Cursor position information.

The second alternative is the titlestring. This defines what shows up in the title bar of the window-manager’s window (not Vim window).

Using this is quite similar to using the ruler. Just turn it on and set a value to be shown. This is what I use currently:

set title
let &titlestring = '%t%( %m%r%)%( <%{get(g:, "cur_project", "")}>%)' .
            \ '%( (%{expand("%:~:.:h")})%)' .
            \ '%( (%{getcwd()})%)%( %a%) - %(%{v:servername}%)'

This contains the buffer’s name, indicators for modified and read-only, value of the global variable cur_project (if set), path of the current buffer relative from current directory, the current working directory itself, and finally, the servername.

Note that I use titlestring with GVim. If you want it to work when working with terminal Vim as well, you might need to consult your terminal emulator’s (or multiplexer’s) documentation regarding this.

Opening & Switching Buffers¶

This is a problem that is usually solved with one of the fuzzy finder plugins. The current most popular one appears to be a plugin based on fzf. I have used Command-T, ctrlp, LeaderF and even one that I made for myself. But then something happened on my system that broke the fuzzy-finder that I was using at the time (don’t exactly remember which). Pressed for time, I chose to use the commands that come with Vim, and haven’t bothered to investigate what broke the fuzzy finder. The following has been enough to keep me happy and productive:

" Simple mappings for buffer switching.
nnoremap <Leader>d :b *
nnoremap <Leader>l :ls<CR>

" Find/edit files
nnoremap <Leader>f :find *
nnoremap <Leader>e :edit **/*

It may not seem as powerful when you put it beside the shiny screen recordings of the fuzzy finder plugins, but it just works ™ and works perfectly fine. I took inspiration from this excellent article on the topic by romainl. Thank you!

Change CWD Smartly¶

This is another very old mapping that still survives. It comes in two flavors, I use cm and cu for these. Briefly,

cm – cd to current buffer’s directory.
cu – cd to the current project’s root directory.

The first one is fairly simple to implement:

" Mapping to change pwd to the directory of the current buffer.
nnoremap cm :call chdir(expand('%:p:h')) \| pwd<CR>

For the second one, it is important to understand how a project’s root identified. To me, it’s a directory containing the .git folder. That’s not a perfect answer, but it hasn’t failed me a lot so far. Nevertheless, my mapping below supports looking for a few other such project markers, like .hg for mercurial VCS, .project for Eclipse projects, manage.py for Django projects etc.

There’s a few plugins that do this as well, probably better than this, but I like to do these kind of simple things myself, to have control and to have it tuned to my habits.

" Map to change pwd to the repo-root-directory of the current buffer.
nnoremap cu :call <SID>CdToRepoRoot()<CR>
let g:markers = split('.git .hg .svn .project .idea manage.py pom.xml')
fun s:CdToRepoRoot() abort
    for marker in g:markers
        let root = finddir(marker, expand('%:p:h') . ';')
        if !empty(root)
            let root = fnamemodify(root, ':h')
            call chdir(root)
            echo 'cd ' . root . ' (found ' . marker . ')'
            return
        endif
    endfor
    echoerr 'No repo root found.'
endfun

What’s happening here is that for each marker in g:markers, we navigate up from the current buffer’s directory until we find a folder that has the marker. If found, we chdir to it. Otherwise, we repeat the process for the next marker. If no marker was found, we just show an error message. Simple & effective.

Jumping over Paragraphs¶

This is one of the things I wanted for a long time, but couldn’t figure out a robust solution. It’s only last year (IIRC) that I finally nailed it and this version works exactly how I want it.

The idea is that the keys <C-j> and <C-k> will jump over paragraphs, and place the cursor at the start of the first line in the paragraph. I needed the following to be true:

After hitting either key, the cursor is positioned on the first line of a paragraph, never on a blank line.
When in the middle of a paragraph, <C-k> moves the cursor to the first line of the current paragraph.
Moves are not added to the jumplist.
Cursor is placed on the first non-blank character of the paragraph. Like ^, not 0.
They should work just fine in both normal & visual modes and the visual mode type should not change when hitting the keys.

Here’s how I’m doing this:

noremap <silent> <expr> <C-k> (line('.') - search('^\n.\+$', 'Wenb')) . 'kzv^'
noremap <silent> <expr> <C-j> (search('^\n.', 'Wen') - line('.')) . 'jzv^'

I needed to use the <expr> way of mapping keys here so as to satisfy the third and fifth of my requirements list above.

The default mappings that come closest to this are the { and }. But they don’t satisfy my first and third requirements, and I’m very picky. I actually still use them, when they seem appropriate, but I hit the above custom mappings a lot more often.

Vertical Line Selection¶

This is one of my recent favorites (< 2 years old). This is the use case, usually when I went into visual block mode with <C-v>, I extend it upwards to the first line in paragraph and also downwards to the last line of the paragraph.

The following GIF might make this easier to understand:

This seems simple enough to do manually when there’s just a few lines to deal with. But when there’s >15 lines and you notice yourself doing this a dozen times a day, you need a better way.

The following mapping is my solution to this. When I hit vm, the following happens:

Visual block selection is activated.
Selection extends as a single column downwards until we hit a line that’s shorter than the cursor column position or we hit end of buffer.
Selection extends in a similar fashion upwards.

The way this is implemented is that firstly we compute the number of lines to be travelled upwards and downwards from the current position. Then we construct a normal mode command which will start the visual-block mode and move the cursor so that the vertical line is selected. For example, in the GIF above, our function would construct the normal mode command \<C-v>2jo1k. This works quite well and doesn’t affect the jumplist.

nnoremap <expr> vm <SID>VisualVLine()
fun! s:VisualVLine() abort
    let [_, lnum, col; _] = getcurpos()
    let line = getline('.')
    let col += strdisplaywidth(line) - strwidth(line)

    let [from, to] = [lnum, lnum]
    while strdisplaywidth(getline(from - 1)) >= col
        let from -= 1
    endwhile

    while strdisplaywidth(getline(to + 1)) >= col
        let to += 1
    endwhile

    return "\<C-v>" .
                \ (to == lnum ? '' : (to - lnum . 'jo')) .
                \ (from == lnum ? '' : (lnum - from . 'k'))
endfun

Zoom When Presenting¶

Occasionally (read: more often than I like to admit), I end up having to present some code to a small audience with is slightly larger than my immediate team. Additionally, I also note down the proceedings of meetings in Vim and present them on screen sharing to get inputs and corrections, essentially steering the meeting.

On such occasions, I need to increase the font size so it’s visible to everyone in the audience / meeting. When presenting, I’ve heard complaints from people sitting a bit far back, and when sharing my screen, I’ve heard complaints from people connecting from their mobile devices (!).

The following two mappings are born out of this need.

" Increase / Decrease font size.
let g:font_size_pat = s:iswin ? ':h\zs\d\+' : '\d\+'
nnoremap <silent> z+ :<C-u>let &guifont = substitute(
            \ &guifont, g:font_size_pat,
            \ '\=eval(submatch(0) + ' . v:count1 . ')', '')
            \ \|simalt ~x<CR>
nnoremap <silent> z- :<C-u>let &guifont = substitute(
            \ &guifont, g:font_size_pat,
            \ '\=eval(submatch(0) - ' . v:count1 . ')', '')
            \ \|simalt ~x<CR>
nmap z<kPlus> z+
nmap z<kMinus> z-

This snippet defines two mappings in normal mode, z+ and z-, that work with the keypad as well (which is what the last two lines are for).

This works by calling substitute on the guifont option with a pattern tailored for how the font size is specified on the current platform. The replacement for this pattern contains a sub-replace-expression that spits out the new font size number.

However, there was a quirk. Once the font size is changed, the Vim window is restored (not maximized anymore). This was annoying to me since I almost always keep my Vim maximized (especially when presenting). So, the following simalt ~x will maximize the window again.

Another small additional feature in these mappings is that they accept a count. For example, hitting z+ will increase the font size by 1 point, hitting 3z+ will increase it by 3 points.

Copy Lines as CSV¶

I write my notes, both work and study in Vim, as plain text, loosely Markdown (I’ll write about that in a future article). Among these notes, there’s occasionally lists of domain specific stuff for the applications or projects I’m working with. I usually need these as reference for objects that I often look up in databases. For example, I have a note like the following:

| Object  | Database ID |
| ------- | ----------- |
| Mercury | 4           |
| Venus   | 32          |
| Earth   | 42          |
| Moon    | 44          |

From this I want do a visual-block selection of all the ID numbers and paste it into an SQL SELECT query that looks something like:

SELECT * FROM celestial_objects WHERE id IN (4, 32, 42, 44);

Essentially, what I needed was to copy the visually selected lines as a comma separated string. It might seem an overkill solution for the example I’m demonstrating here, but when there’s ID numbers in the millions and Markdown tables with over a dozen rows as reference in my notes, it quickly adds up to being extremely annoying.

So, I came up with the following:

" Copy selected lines as CSV
xnoremap <silent> <Leader>y :<C-u>call <SID>CopyLinesAsCSV()<CR>
fun s:CopyLinesAsCSV() abort
    let [_, l1, c1, _] = getpos("'<")
    let [_, l2, c2, _] = getpos("'>")
    let lines = map(getline(l1, l2), {i, l -> trim(l[c1-1:c2-1])})
    call setreg(v:register, join(lines, ', '), 'l')
endfun

This defines a mapping in visual mode, <Leader>y (which can take a register, just like the default y) that takes the selected lines (or selected block), joins them with ', ' and puts that in the register.

Here’s a preview of this in action:

This combined with the vm explained in a previous section, it’s really quick to take a column of values as a comma separated string.

Conclusion¶

This is a continuous process of identifying and honing the habits at work. Considering how programmable Vim can be when it comes to editing text, it’s both fun and productive to introspect. Although I won’t discourage you from it, I recommend not to just blindly copy everything here into your own vimrc. Take only if you need, take only what you need, and do take everything you need.

I plan to write the next chapter in this series next month, so stay tuned and remember to check back.

Identify, optimize, repeat.

Read the previous article, or the next article in this series.

Python's `itertools.groupby` callable

2020-02-09T00:00:00+05:30

The groupby utility from the itertools module can be used to group contiguous items in a sequence based on some property of the items.

Python has several utilities for working with lists and other sequence data types. In addition to a lot of such utilities being directly available as builtins (like map, filter, zip etc), the itertools module is dedicated to this purpose. In this article, I’ll show the groupby callable from this standard library module. I hope to write more in the future on the other awesome stuff from this module.

Table of Contents

Basic Usage
Non-contiguous Groups
Groups are Iterables
A Really Bad DIY Implementation
Usage Tips
Conclusion

Basic Usage¶

The point of itertools.groupby can be illustrated quite easily by applying to a list of zeroes and ones, to be grouped by their values. Check out the following example:

import itertools

numbers = [1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0]

for grouping_value, group_items in itertools.groupby(numbers):
    print('By', grouping_value, '->', *group_items)

This will produce the following output:

By 1 -> 1 1 1
By 0 -> 0 0
By 1 -> 1
By 0 -> 0 0 0
By 1 -> 1
By 0 -> 0

Now let’s look at this, little by little. The groupby call takes one or, probably more often, two arguments:

iterable: An iterable (like a list or any other collection). Items in this collection will be grouped.
key (defaults to None): A function that is applied to each element from iterable, the return values of which are used to do the grouping.
returns: A generator that yields tuples of (grouping_value, iterable_of_group_elements) for each group that was found.

In the example above, we give the numbers list to the groupby call which yields six groups (as can be seen from the six lines of output). Since we haven’t provided a value for the key argument, the grouping occurs on the elements themselves.

So now the output should make sense. The first group, where the grouping_value is 1 will contain three elements, the first three 1s in our list. The next group, where the grouping_value is 0 will contain the next two 0s in our list. This goes on until the list passed to groupby is exhausted.

It is important to note here that inside the tuples yielded by groupby, what we have are iterables that yield the group’s items. They are not lists. More specifically, the tuple contains an object of type itertools._grouper, which is just an iterable over the values in the group. This point is elaborated in a section further below.

Non-contiguous Groups¶

This often comes up as a surprise to people new to itertools.groupby (it certainly did for me). The groups created are of contiguous regions only. For example, if we are trying group even and odd numbers from a collection ordered of numbers, just a call to groupby can produce surprising results:

import itertools

for is_even, number_group in itertools.groupby(range(10), key=lambda x: x % 2 == 0):
    print('Evens:' if is_even else 'Odds:', *number_group)

This produces the following (probably unexpected) result:

Evens: 0
Odds: 1
Evens: 2
Odds: 3
Evens: 4
Odds: 5
Evens: 6
Odds: 7
Evens: 8
Odds: 9

What we would’ve liked is something like the following:

Evens: 0 2 4 6 8
Odds: 1 3 5 7 9

If we search the ever helpful internet for a solution to this “problem”, the answer seems to be to sort the initial list with the same key function and then pass the result to groupby. This is how that would work:

import itertools

def is_even(n):
    return n % 2 == 0


for is_even_val, number_group in itertools.groupby(sorted(range(10), key=is_even), key=is_even):
    print('Evens:' if is_even_val else 'Odds:', *number_group)

This produces an output much closer to what we wanted:

Odds: 1 3 5 7 9
Evens: 0 2 4 6 8

Now, ignoring the evil of pre-mature optimization, the fact that we are calling the key function twice might cause terminally serious itches to some developers. One (possibly silly) way around this is to store the results of the key function right next to the values, as a tuple and then unpack the values once we’re done grouping. This would look like:

import itertools

def is_even(n):
    return n % 2 == 0


numbers = range(10)
keyed_numbers = [(is_even(n), n) for n in numbers]
sorted_numbers = sorted(keyed_numbers)

for is_even_val, pair_group in itertools.groupby(sorted_numbers, key=lambda pair: pair[0]):
    print('Evens:' if is_even_val else 'Odds:', *(pair[1] for pair in pair_group))

This produces the same output as the previous example, but calls the key function (is_even in this example’s case) only once per item in our list.

Before you attempt the above apparent solution to performance issues, prove to yourself that firstly, you have a performance issue and that this piece of code is at least part of the reason for it. Otherwise you’re probably just wasting your time.

Since this is arguably more useful, let’s create an alternative groupby that will sort first and then call itertools.groupby:

import itertools

def sorted_groupby(iterable, key=None):
    yield from itertools.groupby(sorted(iterable, key=key), key=key)

We can use this function like:

for is_even_val, number_group in sorted_groupby(range(10), key=lambda x: x % 2 == 0):
    print('Evens:' if is_even_val else 'Odds:', *number_group)

This will produce the same output as below:

Odds: 1 3 5 7 9
Evens: 0 2 4 6 8

Groups are Iterables¶

I have mentioned this earlier in this article, but it’s important enough to stress again. The group collections yielded by the groupby call are not lists. They are iterables that are rendered unusable upon yielding the next group. If you need the values, make sure you collect them before going to the next group.

For example, consider the following snippet:

import itertools
from pprint import pprint

names = ['Arthur', 'Trillian', 'ford', 'zaphod', 'slartibartfast']

by_casing = dict(itertools.groupby(names, key=str.istitle))
pprint(by_casing)
pprint(list(by_casing[True]))
pprint(list(by_casing[False]))

This produces the following output:

{False: <itertools._grouper object at 0x0000000002B6D278>,
 True: <itertools._grouper object at 0x0000000002B6BF28>}
[]
[]

The seemingly strange thing to notice here, is that although groupby returned two groupings, their grouped values are empty (hinted by the two empty lists output). But of course, groupby wouldn’t return a group unless there’s at least one item in the corresponding collection. So, what’s going on?

This is the point I was getting at in the first paragraph of this section. The grouping collections (the values in the dictionary above) are de facto destroyed once we yield another group. So, if we wanted to construct a dictionary like this, we need to do something like the following:

import itertools
from collections import defaultdict
from pprint import pprint

names = ['Arthur', 'Trillian', 'ford', 'zaphod', 'slartibartfast']

by_casing = defaultdict(list)

for is_title, group_names in itertools.groupby(names, key=str.istitle):
    by_casing[is_title].extend(group_names)

pprint(dict(by_casing))
pprint(by_casing[True])
pprint(by_casing[False])

This would produce the following output:

{False: ['ford', 'zaphod', 'slartibartfast'], True: ['Arthur', 'Trillian']}
['Arthur', 'Trillian']
['ford', 'zaphod', 'slartibartfast']

Just something to keep in mind.

The above snippet of code uses collections.defaultdict. I haven’t written about this yet, but I intend to, in the near future (most likely within the 21st century).

A Really Bad DIY Implementation¶

Let’s try and create an implementation of our own version of groupby, called insane_grouper. It should have the following characteristics:

Take an iterable, and optionally a key function, interpreting like itertools.groupby.
Group non-contiguous items as a single collections.
Return a dictionary of each group’s key value as the keys and the group’s list of items as the values.
- This is great since it goes well with our point 2 above. For computing non-contiguous groups, it is not possible to compute the groups lazily (why? is an exercise for the reader). So, might as well return a dictionary with all the groups.

This might look something like the following:

import itertools
from collections import defaultdict
from pprint import pprint

def insane_grouper(iterable, key=None):
    groups = defaultdict(list)

    for item in iterable:
        groups[item if key is None else key(item)].append(item)

    return dict(groups)


names = ['Arthur', 'ford', 'zaphod', 'Trillian', 'slartibartfast']
pprint(insane_grouper(names, str.istitle))

pprint(insane_grouper(range(10), lambda x: x % 2 == 0))

The output of this snippet is the following:

{False: ['ford', 'zaphod', 'slartibartfast'], True: ['Arthur', 'Trillian']}
{False: [1, 3, 5, 7, 9], True: [0, 2, 4, 6, 8]}

Usage Tips¶

Here’s a few tips and cases where this can be used to quickly compute distinct collections of objects:

A list of dictionaries can be grouped by the value against a particular key present in all (or some?) of the dictionaries in the list.
The key function can return a tuple. This can be useful where we need to group the items by multiple criteria, instead of just one.

Conclusion¶

While the default behaviour of itertools.groupby may not always be what one expects, it is still useful. The important point to note is to understand the problem you’re solving, consider the tools at your disposal and choose the right tool for the job. On that note, I’ll leave you with another link to the itertools module.

The `tar` Command Clipboard

2020-02-02T00:00:00+05:30

Recently, while doing an experiment with my blog’s rendered output with a VPS instance, I needed to transfer it to the server over SSH. While doing that, I experimented with archiving the folder a bit, so I’m putting the outcome of that experience here, should I need it again in the future.

All notes below assume GNU tar v1.26. More specifically, the output of tar --version | head -1 gives:

tar (GNU tar) 1.26

I’m only listing the arguments and use-cases that I think are most frequently used (at least by me) and the ones I’m most likely to need in the future. Please complement this with a healthy serving of man tar to keep your sanity.

Check out this neat little tool to help generate often-used tar commands: cligen.sharats.me. Thanks!

Table of Contents

Creating Archives
Inspecting Archives
- Single vs Multiple Top Levels
Extracting Archives
- Extracting to Different Directory
Transferring Archives / Directories
- Local to Remote
- Remote to Local
Conclusion

Creating Archives¶

The -c (or --create) command is used to create archives.

The - in front of the c can be omitted, but I find that ugly and prefer to include it. That way it’s consistent with most other such GNU commands.

Additional options after -c:

v – Enable verbose output. Adding this will print each file as it is being added to the archive.
z or j – Specify the compression format, if needed. Use z for gz archive or j for bz2 archive. This can also be a to infer the compression format from the file name, but only if the f (explained in the next point) is also given. Other compression formats like --xz, --lzip etc. can also be used.
f – Use the next argument as the file name of the archive. If this argument is not provided, the archive content is written to the standard out.
--remove-files – Remove files after adding them to the archive. Be careful with this.

To illustrate the examples, I’ll clone one of my public repositories and play around with creating archives of it.

$ git clone git@github.com:sharat87/just-a-calendar.git
$ du -sh just-a-calendar
248K    just-a-calendar/

Create a `.tar.bz2` Archive¶

To create a bz2 archive of a folder:

$ tar -cjf package.tar.bz2 just-a-calendar
$ file package.tar.bz2
package.tar.bz2: bzip2 compressed data, block size = 900k
$ du -sh package.tar.bz2
76K     package.tar.bz2

Since we are specifying the file name here, which includes the .bz2 part at the end, we can tell tar to just figure out the compression we want to use. Instead of the j argument specifying the compression, we’d put in a to indicate this.

$ tar -caf package.tar.bz2 just-a-calendar
$ file package.tar.bz2
package.tar.bz2: bzip2 compressed data, block size = 900k
$ du -sh package.tar.bz2
76K     package.tar.bz2

Exclude `.git` Directory¶

Now, the archive also contains the .git directory that was present in our clone. We probably don’t what that. The tar command provides --exclude* family of arguments to deal with this. For example, as in our case, to ignore the folder .git, we could do:

$ tar -caf package.tar.bz2 --exclude=.git just-a-calendar
$ du -sh package.tar.bz2
12K     package.tar.bz2

This package doesn’t contain the .git folder (and consequently is much smaller). However, for this particular problem, there’s perhaps an even better solution, the --exclude-vcs argument. This argument will ignore any VCS directories automatically and it knows about .git. So our command becomes:

$ tar -caf package.tar.bz2 --exclude-vcs just-a-calendar

Another similar useful argument is the --eclude-backups, which will exclude backup and lock files which also is usually what we want.

Set Initial Directory¶

The -C (or --directory) argument sets the initial working directory before creating the archive. This will influence the paths with which the files inside the archive are saved with. This is normally only useful if for some reason you can’t cd or pushd to that directory yourself, which is not very often.

Inspecting Archives¶

The -t (or --list) can be used to list the contents of an archive without extracting it.

Additional options after -t:

v – Verbose listing. The affect of adding this option is like adding -l to the ls command. That is, it will show each file’s permissions, size, last modified etc. details.
f – Treat next argument as the archive file name. This argument is usually always needed with the -t command (unless the archive is being piped in to the tar -t command).

Let’s run this on our package archive created in the previous section.

$ tar -tf package.tar.bz2 | wc -l
6

Single vs Multiple Top Levels¶

There’s one thing about extracting archives that’s extremely annoying. If it contains multiple files at top level, it’ll pollute the current directory with several objects. To combat this, if we make it a habit to create a new folder and extract inside it, it might turn out that the archive itself contains a top level directory, so now we end up one useless directory in the tree.

This situation is actually handled very well by the aunpack command from the atool script. This command takes an archive (of any of several different formats) and extracts it. If it contains a single top level entry, it is extracted to your working directory. If it contains several top level entries, a new directory is created and the extraction happens inside that new directory. This command is extremely convenient, for this and several other reasons.

To find out if an archive has a single top-level entry or multiple, the following snippet can be used:

tar -tf package.tar.bz2 | cut -d/ -f1 | sort -u

This will print out one top-level entry per line. If there’s only one line in the output, then there’s only one top-level. How this works is that first, the cut command splits the listing with / character, the file separator and only prints the first entry, which will be the top level entry. Then, the sort command will sort the top-levels and only print the unique entries (that’s what the -u is for). We could further pipe this to wc -l and check if it results in 1.

Extracting Archives¶

The -x (or --extract) command is used to extract the contents of archives.

This command takes the following arguments:

v – Verbose logging. Prints each file path as it is being extracted.
z or j – Specify the compression format, if needed. Similar in working as with the -c command.
f – Reads the next argument as the archive file name. This is almost always used with this command to specify the archive to extract. If this is not provided, the archive content is expected to be available from standard input.
k (or --keep-old-files) – Fail if any existing files will be overwritten by extracting. This is useful if you don’t want any of your existing files to be overwritten.

So, to extract our archive (in a separate location, of course):

$ mkdir spike && cd spike
$ tar -xaf ../package.tar.bz2

Extracting to Different Directory¶

The extract command also supports the -C (or --directory) argument that sets the initial working directory before extracting. This can be used to change the location where the extracted files/folder will be saved.

Transferring Archives / Directories¶

In this section, I’ll show a couple of quick examples where we need to transfer a folder tree between current local system and a remote system reachable by SSH.

Local to Remote¶

We could create a tar file of the folder (and any other files as well), transfer the file to the remote system, login to the remote system and unpack it there.

There’s a couple of problems with this approach:

Since we are creating an archive of the folder on our local disk, we need to have the necessary free space for that archive. This may be less the size of the folder, but can still be significant if the folder is large. The same problem will also appear on the remote system.
We need write permissions on the local disk. If we want to just take a folder to a remote system, we should only need write permission on the remote disk, not on the local disk.

To avoid the above two problems, we can transfer the archive directly as a stream, without saving it to the local disk. Notice that if we don’t provide a filename for the create (-c) command, the archive will be written to standard out. Similarly, if we don’t provide a filename for the extract (-x) command, it will read the archive from standard input. Our solution below will leverage these two facts.

tar -cj just-a-calendar | ssh remote tar -xj

The first command (tar -cj just-a-calendar) creates a bzip2-compressed archive (we could’ve used z here to use gz compression instead) and writes it to the standard out. This becomes the standard input for the ssh command which will connect to the remote host, invoke the tar -xj command, and forwards it’s own standard input to that tar -xj command. The tar -xj command extracts the archive from it’s standard input, using bzip2 for decompressing and writes the extracted contents to the remote user’s home directory.

For added measure, we could use the -C (or --directory) argument to tar -xj to set the directory where the extracted files would be saved.

This method is extremely handy since the archive is not written to the disk anywhere, not on local, not on remote. It’s only processed as a stream of bytes.

The -j argument to the tar commands is not strictly necessary. The whole thing will work even without it. But since the archive is being transferred over network, it pays to spend a little processor time into compressing it so as to minimize network usage (and consequently, speed up the operation).

We could’ve added the -v argument to one (or both!?) tar commands to show the files as they are being archived/extracted.

Remote to Local¶

This follows a similar method as in the previous section, but in the other way around. We run the archiver tar command on the remote host, and the extractor tar command on the local machine.

ssh remote tar -cj just-a-calendar | tar -xj

This will recreate the just-a-calendar folder on the remote host, onto the local disk. We could use the -C argument to either tar command to set it’s initial working directory.

Of course, if wanted to just save the archive on the local disk, not extract it, we could just redirect the stream to a file.

ssh remote tar -cj just-a-calendar > package.tar.bz2

Conclusion¶

The tar command, in all it’s variations, is irreplaceable in it’s utility for these kind of purposes. The handiest resource for getting help while working with it is, of course, the man page. But when we’re in the mood to just copy-pasta (yes, pasta) a command to serve the purpose, I hope this article will be helpful.

Working with Strings in Python

2020-01-26T00:00:00+05:30

This article will be a practical rundown of working with strings in Python, made up of things I constantly forget and have to look up on how to do. I hope it will serve as a super-quick reference for me as well as for anybody else who stumbles here.

This document is not intended for beginners to Python. Although you can still get something out of it, it’s best suited for intermediate Python programmers. I tried to illustrate the concepts in a crisp manner with minimum carry-over context from one section to the next.

Table of Contents

Defining Strings
Auto-concatenated Strings
Raw Strings
Concatenation
Splitting
- The .splitlines Method
Substring Check
- Prefix and Suffix Check
- Regular Expressions Check
Learning About the Contents
- Numeric Checks
Transformations
String Formatting
Docstrings
Conclusion

Defining Strings¶

Single and Double Quoted Strings¶

We’ll refer to strings delimited by the ' character as single quoted strings and those delimited by " as double quoted strings.

They are identical in all respects, except that single quote needs to be escaped in single quoted strings and double quote needs to be escaped in double quoted strings.

They cannot span multiple lines. A string’s ending quote character must appear in the same line as it begins. This can be worked around by using a \ character at the end of the line. For example:

text = 'abc\
def'
print(text)

This will print:

abc
def

But it’s best to avoid breaking using \ to break strings into multiple lines. It’s not pretty and there’s better way to do it. Especially auto-concatenated strings (discussed below).

Tripled Quoted Strings¶

Tripled quoted strings are a syntax for defining multi-line strings. There’s no practical difference between defining strings with ''' and """.

In practice, this syntax is commonly used for one of the following:

Docstrings (discussed below), for writing documentation for classes/functions.
Module level constant strings that contain long multi-line content. Can be used for small HTML templates that are stored inline or complex SQL queries, long regular expression patterns etc.
An approximation for multi-line comments. Python doesn’t have multi-line comments (like /* and */ in C-like languages). Wrapping whole code blocks with tripled quotes can turn it into a pseudo-comment. I personally discourage this, but it’s nonetheless used in real-world code.

The string created when using tripled quoted strings will contain everything between the tripled quotes. This includes any indentation present due to Python block-style formatting. For example:

def make_story():
    text = '''
    Once upon a time, there was a planet.
    Suddenly, it named itself Earth.
    And it hoped to live happily ever after.
    '''

    return text


print(repr(make_story()))

This will produce the following output:

'\n    Once upon a time, there was a planet.\n    Suddenly, it named itself Earth.\n    And it hoped to live happily ever after.\n    '

There’s three things to note in the string defined in this function:

It starts with a newline character, the one that comes right after the opening ''' on line 2.
- This particular point can be easily addressed by adding a \ right after the opening '''.
Each line, except for the first, starts with four spaces, because of the indentation of the make_story function.
- The textwrap.dedent function from standard library can help deal with this. Details in the next paragraph.
It ends with a newline character and the four spaces from the line 6.
- Calling .strip (or .rstrip) on the string can do this.

Considering the above three points, we rewrite the previous code fragment as:

import textwrap

def make_story():
    text = textwrap.dedent('''\
    Once upon a time, there was a planet.
    Suddenly, it named itself Earth.
    And it hoped to live happily ever after.
    '''.rstrip())

    return text

Note that it is important to use .rstrip here, and not .strip. The reason is that .strip will remove the whitespace before Once... line and so the first line in the string won’t have any indentation. Now the documentation of textwrap.dedent says:

Remove any common leading whitespace from every line in text.

But since our first line doesn’t have the indentation anymore, there’s no common leading whitespace in text. So, this function won’t remove the indentation. Another option would be to do dedent first, and then call .strip on the result of dedent.

The output of this program would be:

'Once upon a time, there was a planet.\nSuddenly, it named itself Earth.\nAnd it hoped to live happily ever after.'

Escape Characters¶

Backslash based escape characters behave exactly the same way in strings defined with any quote type.

Following is a list of commonly used escape characters. This list is not exhaustive.

Escape sequence	Result
`'\'` (at end of line)	String definition is continued to next line
`'\n'`	Newline character
`'\\'`	Literal backslash character
`'\''`	Single quote character, useful in single quoted strings, but works everywhere
`"\""`	Double quote character, useful in double quoted strings, but works everywhere
`'\xhh'`	Character by hex value given by the `hh` part

Regarding escaping quote characters:

Single quotes don’t have to be escaped in double quote strings, but it’s not an error to do so.
Double quotes don’t have to be escaped in single quote strings, but it’s not an error to do so.
Neither quotes have to be escaped in tripled quote strings, but it’s not an error to do so.

In tripled quote strings, the delimiters cannot be escaped to become part of the string. For example, a ''' sequence cannot be part of the string when the string is defined with '''. But it may be part of the string, when it’s defined with " or """. This behaviour cannot be escaped.

Auto-concatenated Strings¶

Python has a nice compiler level feature to auto-concatenate literal strings that are next to each other (or more correctly, forming a single expressions). Take a look at an example to illustrate the point:

query = (
    'SELECT * FROM employees'
    '  WHERE name = ?'
)

print(query)

The string query is defined as two parts, each on lines 2 and 3. These two strings will be concatenated automatically at compile-time. The output of the above program would be:

SELECT * FROM employees  WHERE name = ?

Things to note regarding this behaviour:

The strings don’t have any operator between them, like + or , or something else.
This works only with string literals, it won’t work when applied to variables.
This is a compile-time feature, and so is more performant than string concatenation using the + operator.
The multiple string literals should be part of the same expression. So, if we are writing them on multiple lines, they have to wrapped in parentheses or we should use the \ character to tell Python to treat multiple lines as a single expression.
Works with combinations of ordinary strings, raw strings, format strings and any combinations of them together.

Thanks to this feature, there’s almost never a reason to define long string constants by concatenating several strings.

Raw Strings¶

Python’s raw strings’ syntax is a small variation that disables the escaping behaviour of the \ character. A string is treated as a raw string if the starting delimiter quote is prefixed with a r (or R) character.

The following expressions create equal (as defined by == operator) string:

Unadorned string	Raw string
`'abc'`	`r'abc'`
`'abc\ndef'`	not possible
`'abc\\ndef'`	`r'abc\ndef'`

In other words, the special escaping behaviour of \ character cannot be used in raw strings. This is useful when you have a lot of \\ in your unadorned string. Such a string’s definition can be much simpler if using raw strings.

Points to note regarding raw strings:

Can be used with single, double or tripled quotes.
The actual string object created is no different from the one when using unadorned string syntax. It is just a syntax-level convenience.
Delimiter quotes cannot be included in raw strings. In other words, single quotes cannot be a part of raw single quote strings. For example, r'abc\'def' gives the string "abc\\'def". That is, the string will contain one backslash, and one single quote, essentially it will be exactly as it looks like in the definition.
Cannot be defined to end with a single \. The expression r'abc\' will raise a SyntaxError. The expression r'abc\\' will end with two backslash characters.

The limitations above can be worked around by using raw and ordinary strings together.

Most commonly useful scenarios for raw strings:

Regular expression patterns, to be used with the re module.
Windows style file paths, where the separator is the backslash character. Note that the open function works fine even with forward slashes on Windows, so this is generally not needed.
SQL queries, especially when defined with tripled quotes as module level constants.

Concatenation¶

The + operator can be used to concatenate two strings. This will create a new string object which is the result of the concatenation (str objects are immutable in Python).

If there’s several strings being concatenated, using the + operator may not be the best way to do this. For example, consider the following snippet of code:

text = ''

for i in range(4):
    text += 'we have %r\n' % i

print(text)

When run, it produces the following output:

we have 0
we have 1
we have 2
we have 3

However, using the + operator here means that intermediate string objects are created at every concatenation operation. This is needless memory allocation since these intermediate string objects are never used, and are ready for garbage collection rather quickly. For situations like this, there’s better options than concatenating strings using + operator.

One option is to use a list and then pass it to ''.join method to concatenate them all in one go. Using this option in the above code snippet, we get:

fragments = []

for i in range(4):
    fragments.append('we have %r\n' % i)

text = ''.join(fragments)
print(text)

Additionally, in this case, we could’ve used '\n'.join instead and avoid the trailing newline in text (if that’s what is desired, don’t do it just because we can).

lines = []

for i in range(4):
    lines.append('we have %r' % i)

text = '\n'.join(lines)
print(text)

Another option is to use io.StringIO which is a file-like, in-memory, string buffer that you can .write string content to and then turn it into a single string object when done. Rewriting the above code snippet to use this option:

import io

buffer = io.StringIO()
for i in range(4):
    buffer.write('we have %r\n' % i)
text = buffer.getvalue()
print(text)

Both solutions are better than concatenating strings with + operator, but if you’re just concatenating two or three strings, it’s probably simpler to just use + and move on. Premature optimisation is the root of all evil.

Splitting¶

Python strings have the .split method that can be used to split strings into list of tokens or parts. There’s three things to this method to understand:

First, it takes a separator argument, which can be a string of any length.

print('a,b,c,d'.split(','))
print('a,b;c,d'.split(';'))
print('a b c d'.split(' '))
print('a,,b,,,'.split(','))

This will produce the following output:

['a', 'b', 'c', 'd']
['a,b', 'c,d']
['a', 'b', 'c', 'd']
['a', '', 'b', '', '', '']

Note that adjoining separators will produce empty strings in the returned list.

Second, not passing a value for the separator (or passing None) will split the string over whitespace. Note that this is not the same as splitting with the space character (' '). Consider the following examples:

Expression	Result
`'a b c'.split()`	`['a', 'b', 'c']`
`'a b c'.split()`	`['a', 'b', 'c']`
`'a\tb\nc'.split()`	`['a', 'b', 'c']`
`'a b c '.split()`	`['a', 'b', 'c', '']`
`'a b c '.strip().split()`	`['a', 'b', 'c']`

If you’re familiar with regular expressions, then this splitting over whitespace is similar to splitting over non-overlapping matches of the pattern \s+.

Third, there is a second argument, which is the maximum number of times the string will be cut with the given separator (or whitespace). Thus, if we give 1 in the second argument, the result string will contain at most two elements. Of course, not providing any second argument will mean the string will be split at all occurrences of the separator.

Expression	Result
`'a,b,c,d'.split(',', 2)`	`['a', 'b', 'c,d']`
`'a,b,c,d'.split(',', 10)`	`['a', 'b', 'c', 'd']`
`'hello'.split(',', 10)`	`['hello']`
`'a b c'.split(maxsplit=1)`	`['a', 'b c']`

The `.splitlines` Method¶

The .splitlines method splits the strings into a list of lines. This method is a better version of just doing .split('\n') since it handles many of the nasty end-of-line differences. For example, if your string contains '\r\n' at the end of each line, then doing a .split('\n') will leave dangling '\r' characters at end of each line. This is handled well by the .splitlines method. The official documentation has a list of separators this method splits by, which I won’t repeat here.

Expression	Result
`'a\nb\rc\r\nd'.splitlines()`	`['a', 'b', 'c', 'd']`
`'a b\rc\r\nd'.splitlines()`	`['a b', 'c', 'd']`

Substring Check¶

To check if a string is wholly contained in another string, the in operator should be used. Note that this operator is case-sensitive. If case-insensitivity is needed, the easiest option is to just call .casefold (which is especially designed for this purpose) on both the strings.

needle = 'back'
haystack = 'Going back and forth all the time.'
print(needle in haystack)

This would print True, since the string 'back' occurs in haystack. Note the intent here, for example, consider the following example:

needle = 'back'
haystack = 'Forwards is easier than backwards.'
print(needle in haystack)

This would again print True, but the intent seems to be to look for the word “back”. In that case, we’d expect False here and True in the previous example (since back is not a separate work in the second example). Here again, a simple solution is to call .split on the haystack string before the in operator check. The idea is that we’d get a list of words out of haystack and we check if needle occurs in the list.

needle = 'back'
haystack = 'Forwards is easier than backwards.'
print(needle in haystack.split())

This prints out False. This isn’t anywhere near a foolproof word searching system, but does get you a step ahead.

Prefix and Suffix Check¶

We have the .startswith and .endswith methods on strings if we want to check if a string is not just in another string, but more specifically, if it starts/ends with it.

>>> 'the' in 'Hello there'
True
>>> 'Hello there'.startswith('he')
False
>>> 'Hello there'.endswith('ere')
True
>>> 'Hello there'.lower().startswith('he')
True

Additionally, there’s a useful twist to these two functions. Instead of a single string as argument, they can accept a tuple of strings where it check if the original strings starts/ends with any of the strings in the tuple. Check out the following examples:

>>> 'Hello there'.startswith(('He', 'he'))
True
>>> 'hello there'.startswith(('garbage from outer space', 'He', 'he'))
True

A less obvious fact here is that the original string may be shorter than the string being passed to .startswith/.endswith. This sounds like a nobrainer, but there’s one scenario where it’s particularly nice.

Consider a situation where we want to check if the first character of a string is, say, 'A'. One option to do this is haystack[0] == 'A'. But this runs the risk that if the haystack = '', then haystack[0] will raise an IndexError, where we just wanted False. If we did haystack.startswith('A'), we’d get False if haystack is empty.

Regular Expressions Check¶

Regular expressions are a much larger topic than can be fit under a third level header (may be a future article). So we’ll just cover the substring checking part using regular expressions (in obviously limited scope).

All regex (regular expression) operations in Python start from the re module. There’s no special syntax for defining regex patterns like there is in JavaScript. Patterns are instead written as strings and the re module knows to interpret them as regex patterns.

For our purpose of substring checking, the re module provides the .search function that takes a regex pattern, the haystack string and optionally, any flags for the pattern.

import re
print(re.search('the', 'Hello there'))
print(re.search('he', 'Hello there'))
print(re.search('he', 'Hello there', flags=re.IGNORECASE))
print(re.search('hola', 'Hello there'))

This would produce the following output:

<re.Match object; span=(6, 9), match='the'>
<re.Match object; span=(7, 9), match='he'>
<re.Match object; span=(0, 2), match='He'>
None

A minor point to note here is that the return value is not of boolean type. We get an re.Match object if there is a successful match, else we get None. This is usually a minor concern, because the match objects are truth-y and None is false-y. So, we can pretend it returns a boolean value if we need to.

When using the re.search function this way, the re.escape function might also come in handy. This function will escape any special characters in the give string. Special here means having special behaviour in the context of being a regex pattern.

For example, if the needle is user input and we want to search our haystack such that the needle is at the end of an English sentence, we’d do something like:

re.search(needle + '[.!?:]', haystack)

But this runs the risk of needle having regex special characters like .* and that would match everything, which is probably not what we want. In this case, it’s best to wrap the needle in re.escape and then concatenate the pattern with end-of-sentence markers.

re.search(re.escape(needle) + '[.!?:]', haystack)

As always, please think twice before using regular expressions to solve a problem, and if you do, if the pattern is longer than five or six characters, please make use of re.VERBOSE and add comments to your pattern. You’ll thank yourself later.

Learning About the Contents¶

Python’s strings have some nice methods to quickly check some facts about it’s contents. Here’s a rundown of such methods:

Method	Returns `True` if	On empty string
`isalnum`	all characters are alphanumeric	`False`
`isalpha`	all characters are alphabetic	`False`
`isascii`	all characters are within ASCII range	`True`
`isdecimal`	all characters are decimal characters	`False`
`isdigit`	all characters are digits	`False`
`isidentifier`	string can be a valid Python identifier	`False`
`islower`	has at least one cased character and they are all in lower case	`False`
`isnumeric`	all characters are numeric characters	`False`
`isprintable`	all characters are printable	`True`
`isspace`	all characters are whitespace	`False`
`istitle`	string is title-cased, i.e., all words start with an upper case character	`False`
`isupper`	has at least one cased character and they are all in upper case	`False`

Please use the links to official documentation in the above table to learn more about them. I won’t be repeating those details here.

Numeric Checks¶

You might’ve noticed that we have three different methods that all sound awfully similar to each other: isdecimal, isdigit and isnumeric. The official documentation regarding the difference between these three wasn’t very helpful for me so I’ll try explain it here.

Firstly, isdecimal will consider any character that can be used to build a number in the 10-decimal system as True. That means it will give True for the 0 through 9 digits. Additionally, it will also give True for characters that can be used for similar purpose in other languages. For example, the numbers from Unicode range 3174 to 3183 are of a south Indian language called Telugu (my mother tongue). The isdecimal method returns True for these characters as well. However, note that it is not true for Roman numerals since they can’t technically be used to construct 10-decimal numbers.

>>> # Arabic Numbers
>>> ''.join(chr(i) for i in range(48, 58))
'0123456789'
>>> _.isdecimal()
True
>>>
>>> # Telugu Numbers
>>> ''.join(chr(i) for i in range(3174, 3184))
'౦౧౨౩౪౫౬౭౮౯'
>>> _.isdecimal()
True

Secondly, isdigit gives True for any character that looks like a digit, of any language. So, this includes any character that is True-ed by isdecimal. Additionally, this includes characters like ¹, ², ³, etc., as well as ①, ②, ③. Notice that fraction characters are not considered as digits.

Thirdly, isnumeric gives True for any character that is numeric in nature. So, this includes any character that is True-ed by isdigit. Additionally, this will give True for fraction characters such as ¼, ½, ¾ etc., as well as Roman numbers such as Ⅰ, Ⅱ, Ⅲ, Ⅳ, even Ⅹ, Ⅼ, Ⅽ, Ⅾ, Ⅿ (these are not ordinary alphabets, they are Unicode Roman number characters) etc.

This follows a neat fact regarding the character sets True-ed by the three methods: isdecimal ⊂ isdigit ⊂ isnumeric.

Transformations¶

This section is about methods that return a new string, which is the result of some transformation applied to the original string. Since strings in Python are immutable, transformations always return a new string object. The original string is, always, obviously, left untouched.

Here’s a few commonly used transformation methods (this list is intentionally non-exhaustive):

Method	Transformation
`.strip`	Strips whitespace (or characters from the string in first argument) at the start and end of the string.
`.lstrip`	Strips whitespace (or characters from the string in first argument) only at the start of the string.
`.rstrip`	Strips whitespace (or characters from the string in first argument) only at the end of the string.
`.lower`	All cased characters are converted to lower case, unless they are already in lower case.
`.upper`	All cased characters are converted to upper case, unless they are already in upper case.
`.capitalize`	The first letter is upper-cased and the rest are lower-cased.
`.title`	The first letter in each word in the string is upper-cased, and all others are converted to lower-cased.

Please use the links to official documentation in the above table to learn more about them. I won’t be repeating those details here. The official documentation refers to more methods on strings that I suggest skimming over. I happened to reinvent the wheel with transforming strings because I didn’t know Python already provided a method for what I needed.

String Formatting¶

String formatting in Python comes majorly in two flavors. First is the (now old) printf-style formatting that uses typed control characters prefixed with %, similar to the printf (more like sprintf) function in C. Second is the new format builtin function and the accompanying str.format method that is more suited to Python’s dynamic typing, and arguably, is much easier to use.

Python’s formatting capabilities are quite vast and powerful, warranting a whole separate article. I intend to do that some time in the coming weeks. Until then, the official documentation on printf-style formatting and the format function should serve you well.

Docstrings¶

Docstrings are strings that serve as documentation for Python’s modules, functions and classes. There’s nothing special in the syntax of these strings per se, but their uniqueness is more due to where they are positioned in a Python program.

Consider the following function with a docstring on line 2

def triple(n):
    """Triples the given number and returns the result."""
    return n * 3


print(triple(4))

The string defined on line 2 in this program is not assigned to any variable. On the face of it, it appears pointless to create a string and just discard it. However, in this case, the fact that this string literal is the first expression in the function definition, makes it a docstring. What that means is that the contents of this string are understood to be a human readable help text regarding the usage of this function.

It also doesn’t have to be a string defined with """. It may be using single quotes, double quotes or any other crazy variation we saw above. But, don’t do that. It’s usually a best practice to write docstrings with """, and I strongly suggest (and even beg) that you stick to using """ for docstrings. Please.

It’s also not entirely true that this string is not assigned to a variable. Docstrings are saved to the .__doc__ attribute of the function (or whatever object) they are documenting. In our example above, we can get the docstring from triple.__doc__. But it’s usually more practical to call the help function to read the docstring.

For classes, the docstring should be the first expression inside the class body, positioned similarly to that of a function. For modules, the docstring should be the first expression in the module (even before any imports).

A minor note regarding docstrings regarding the formatting of their content is to use [ReST][rst] (also called reStructuredText). It is not strictly required, but I suggest you do so, in the event that you choose to generate HTML help pages from your docstrings, you’ll be glad you wrote them in ReST.

Conclusion¶

It’s hard to imagine a Python program that doesn’t have something to do with strings. As such, we have been provided with a lot of utilities within the standard distribution for working with strings. Even in an article of this size, I couldn’t be exhaustive. As always, Python’s official documentation is unreal good. It pays to occasionally open a random page and skim over it.

Automating the Vim workplace

2020-01-12T00:00:00+05:30

I majorly use two tools for my coding workflow and one of them is GVim (on Windows). It’s my primary choice for editing text for ten years now and in that time, I’ve picked up several habits and tricks that made me very productive.

This article is part of a series:

Chapter Ⅰ (this article).
Chapter Ⅱ.
Chapter Ⅲ.

Table of Contents

Motivation
Switching to Normal Mode
Start GVim Maximized, in Windows
Save All Buffers
Copy to System Clipboard
Ensure Directory Exists, Before Saving
Switching to Alternate Buffer
Run Git Commands in :terminal
Non-undo-able Insert Mode Commands
Quickly Open ftplugin
Sorting over Motion
Reversing over Motion
Conclusion

Motivation¶

Most of my text editing involves working with Python, Markdown, and JavaScript source files. When I’m spending as much time as I am with Vim, it ceases to be just a tool in my mind. It becomes a state of mind where I’m able to translate my thoughts into actions much faster than it/I can do with something else (besides being an excuse to be fancy with words). It becomes my workplace.

Just like organizing one’s desk or toolbox for maximum efficiency, we can mold Vim to help us achieve something similar with it. I try to notice things that I do often, that take more than 3-4 seconds of thought and then a few more seconds of hitting hotkeys or commands. These are the ones I try to create a command or a mapping. In my world, this is borderline automation.

What I’m sharing here is stuff I created/scavenged through years of identifying patterns very specific to my work style. My goal is not to share nice tidbits of Vim configuration. It is to encourage you to identify your work style and work towards optimising it, before you go find a plugin and learn the plugin’s work style. As such, I don’t expect you to resonate with the tips I shared here. Your own style of working deserves the first chance, let Vim learn it.

Please note that all that I share below is what I’m using with Vim (more specifically, GVim on Windows). I don’t use Neovim (yet) and I can’t speak for any of the below for Neovim.

Switching to Normal Mode¶

Probably the action that is done most often is switching to normal & insert modes. Switching to insert mode is usually with several different keys (i, a, o etc.), but for switching to normal mode, we usually use one single key. My preference for this is <C-l>, since l is on the home row and the help pages already sort-of indicate that hitting it would go to the normal mode (if 'insertmode' is set, but well, it’s unused otherwise, See :h i_CTRL-l).

inoremap <C-l> <Esc>

This is a topic that often brings up an uncontrollable urge to be vocal about one’s own choice of keys to go to normal mode. I’ve used several of them over the years, jj, <CapsLock> as <ESC>, <C-[>, <C-c>, mapping <C-k>, xcape in the background, etc. All of them felt haphazard, and <C-l> worked the best for me. As I said, this article is about what worked best to my workflow. Go discover your own.

Of course, now we need a quick way to open our vimrc file so we can add this mapping and then get back to whatever we are doing. Well,

nnoremap cv :e $MYVIMRC<CR>

The cv is a mnemonic for change vimrc.

This mapping was originally defined as :e $USERPROFILE/vimfiles/vimrc<CR>. Thanks to the helpful community at r/vim and a comment here, I realized $MYVIMRC is a better fit here. Thank you folks!

This is what I’m talking about when I say identify things that you often do. Even if you don’t sit down to automate it right away, put it on a sticky near your desk. Spend a few minutes thinking about it. A few seconds in a time of intense focus is far more dear than a few minutes in slacking.

Note that this mapping is not without it’s quirks. It interferes with the line completion mapping, <C-x><C-l>. It’ll still work, but right after triggering <C-x><C-l>, if you hit <C-l>, you won’t go to normal mode. You’ll merely go to the next selected item in the completion popup. Other than this, <C-l> for going to normal mode works quite well.

Now that the mapping is setup, I can hit <C-l> in insert mode to go to normal mode. Then I noticed something else in the way I tried to use it, subconsciously. I started hitting <C-l> in visual mode, operator pending mode etc. to go into normal mode. I realized I was using <C-l> essentially as a replacement of <ESC>. But of course it failed because I only created a mapping for insert mode.

After a few iterations and shower thoughts, this is what I currently use:

" Easier way to go to normal mode. Also, alternative to <ESC>.
noremap! <silent> <C-l> <ESC>
vnoremap <silent> <C-l> <ESC>
onoremap <silent> <C-l> <ESC>

I also wanted this from the command line, but I’m still trying to get it to work. I currently have the following but it’s not very robust. Every time I hit <C-l> in the normal mode, the cursor moves ahead by two characters. Still working on getting it to work well.

" <ESC> doesn't work and even this moves the cursor by two characters.
cnoremap <silent> <C-l> <C-c>

It’s a never ending process of learning and experimenting.

Start GVim Maximized, in Windows¶

As another example, I wanted GVim to start maximized when I open it. On way to do this was to check the Maximized checkbox in the GVim shortcut’s properties. But that won’t work when I start GVim from a command line. The solution that worked even better:

" Maximize gVim window.
let s:iswin = has('win32') || has('win64')
if exists(':simalt') > 0 && s:iswin
  autocmd GUIEnter * simalt ~x
endif

Save All Buffers¶

I often use the :wa command to save all my open buffers. But it has the nasty habit of throwing an error when it’s not able to save all buffers. This is annoying because I often have scratch buffers in vertical splits where I dump random pieces of copied text and thoughts. So, I prepared the following hotkey that will execute the :wa command and, if that error comes up, shows a message instead.

nnoremap <silent> <C-m> :try\|wa\|catch /\<E141\>/\|echomsg 'Not all files saved!'\|endtry<CR>

This doesn’t look like an ideal solution, but it hasn’t failed me yet. The idea is not to create an perfect solution, but just one that works well with you.

If you’re using the above mapping, note that mapping to <C-m> is almost the same as mapping to the <Return> key on your keyboard. So hitting the return key in normal mode will also trigger the above mapping. Just something to keep in mind.

Copy to System Clipboard¶

I often have to copy stuff to system clipboard to paste into chat channels and emails. The standard way to do this would be something like “+yap in normal mode, or “+y in visual mode. This is annoying, not because it’s three keys, but more because they are hard to type in order and they are (almost) all with the same hand. So I solved it with the following keys:

xnoremap <C-c> "+y
nnoremap <silent> cp "+y
nnoremap <silent> cpp "+yy

With this, <C-c> in visual mode copies selection to clipboard and cp can be used with text objects. Much easier to hit.

Ensure Directory Exists, Before Saving¶

I often edit new files like :e css/styles.css, without realizing that I have to create the css folder before saving this. But that’s not productive, my tool should do that automatically.

" Create file's directory before saving, if it doesn't exist.
" Original: https://stackoverflow.com/a/4294176/151048
augroup BWCCreateDir
  autocmd!
  autocmd BufWritePre * :call s:MkNonExDir(expand('<afile>'), +expand('<abuf>'))
augroup END
fun! s:MkNonExDir(file, buf)
  if empty(getbufvar(a:buf, '&buftype')) && a:file !~# '\v^\w+\:\/'
    call mkdir(fnamemodify(a:file, ':h'), 'p')
  endif
endfun

Let’s see what’s going on here. Firstly, we define an autocmd for the BufWritePre event, which is fired just before a file is saved, to call the function s:MkNonExDir. In this function, we check for the buffer being a normal buffer (see :h buftype) and if it is, create it’s parent directory.

Simple, non-intrusive, and effective.

Switching to Alternate Buffer¶

The default key-binding for <C-^> (or <C-6>) lets us quickly switch back-and-forth between two buffers. This is extremely handy and is likely one of my most used functionality for switching buffers within Vim.

There’s some annoying quirks to this mapping though. For example, if there’s files in your buffer list, but no alternate buffer, we’ll get an error saying “No alternate buffer”. Which is not helpful. So a few years ago I saw a mapping to go to the next buffer if there’s no alternate buffer. It worked to an extent, but there’s more.

When I delete a buffer with :bd, I get taken to a different buffer. Now if I hit <C-6> again, the buffer I just deleted is loaded again and I’m back in it. This may be what one usually wants, but for me, I want to be taken to the next buffer that’s still loaded, not deleted ones.

" My remapping of <C-^>. If there is no alternate file, and there's no count given, then switch
" to next file. We use `bufloaded` to check for alternate buffer presence. This will ignore
" deleted buffers, as intended. To get default behaviour, use `bufexists` in it's place.
nnoremap <silent> <C-n> :<C-u>exe v:count ? v:count . 'b' : 'b' . (bufloaded(0) ? '#' : 'n')<CR>

This is the mapping I use for switching between alternate buffers. I use <C-n> as it’s easier to hit and there’s a simpler key for it’s default functionality anyway (j).

Additionally if you’re using the eunuch plugin, this mapping will not navigate to a buffer that’s been Delete-ed.

Run Git Commands in `:terminal`¶

Running git commands is another thing I often do, while working in Vim. Most of the time, it just a status or diff, so I needed something quicker than switching to a terminal and running the command.

I initially used fugitive, but it felt slow on Windows (very likely because of the required anti-virus). It works fine when I’m on Linux, but on Windows, it’s not productive for me. Besides, it does a lot of things I don’t usually need. The following is the mapping that serves most of what I need from within Vim.

nnoremap <Leader>g :ter git --no-pager<Space>

So, what does this do? Well, I hit ,g (because , is my mapleader) and the cursor is placed in the command line with the following pre-filled:

:ter git --no-pager

Then I just hit st<Enter>, which will open a new terminal within Vim which runs git st command asynchronously (which is an alias to git status).

After seeing the output I noticed that I immediately issued another ,gdiff<Enter>, which opens up another terminal split to run the git diff command. Such multiple splits quickly got annoying again. Yeah, I’m easily annoyed. I need this mapping to not open a new split if I’m already in a git output terminal window. Here’s what I’m using currently:

nnoremap <Leader>g :ter <C-r>=&buftype == 'terminal'
            \ && job_info(term_getjob('%')).cmd[0] ==? 'git' ? '++curwin ' : ''
            \ <CR>git --no-pager<Space>

We check if the current buffer is a terminal and if the command is git, if yes, we tell :ter to open the terminal in the current window instead of opening up a new split.

Non-undo-able Insert Mode Commands¶

In insert mode, <C-u> deletes everything from start of current line to cursor position (this is not exactly true, read :h i_CTRL-U for the exact behaviour, I won’t repeat it here). This is quite convenient and I use it a lot more than I like to admit. Often, when I start a statement in a new line, I have second thoughts middle of the line and I quickly hit <C-u> and start typing in the idea from my second thought. But then of course, I realize that what I was doing originally was the right way. Now if I try to undo what’s done by <C-u>, I can’t. Since it’s all treated as one insert operation, it’s all one undo step.

This is why I got this:

" CTRL-U in insert mode deletes a lot. Put an undo-point before it.
inoremap <C-u> <C-g>u<C-u>

I don’t recall the source of this but I found this after a bit of searching online for a solution and it works! Whoever came up with this, thank you!

Thanks to this kind person’s hint, I was able to find the source of this. It’s actually in the defaults.vim file that is shipped with Vim.

Quickly Open `ftplugin`¶

This is one that I don’t use as often as some of the above, but when I do need it, it’s extremely handy. I use the $VIMFILES/after/ftplugin directory to put in my custom settings for specific file types. These files usually don’t just contain changes in settings like indentation, but also commentstring and often some command(s) that makes editing that specific filetype a bit easier.

These commands let me open the plugin file in that directory for the filetype I’m currently working with.

" Edit my filetype/syntax plugin files for current filetype.
command -nargs=? -complete=filetype EditFileTypePlugin
            \ exe 'keepj vsplit $VIMFILES/after/ftplugin/' . (empty(<q-args>) ? &filetype : <q-args>) . '.vim'
command -nargs=? -complete=filetype Eft EditFileTypePlugin <args>

The same thing for syntax plugin:

command -nargs=? -complete=filetype EditSyntaxPlugin
            \ exe 'keepj vsplit $VIMFILES/after/syntax/' . (empty(<q-args>) ? &filetype : <q-args>) . '.vim'
command -nargs=? -complete=filetype Esy EditSyntaxPlugin <args>

Note that the :Eft and :Esy commands act like short aliases for these commands.

These commands are obviously heavily inspired by the :EditUltiSnipsFile command from the UltiSnips plugin (which is great at automation by the way).

Sorting over Motion¶

Vim comes with the :sort command that sorts the range of lines provided. So, for example, to sort the whole file, we’d do :%sort. To sort the first ten lines, something like :1,10sort should do. The range of lines given will be replaced with the sorted lines.

This is convenient, but not very handy. But I’d always wanted a way to sort over a motion, like sort this paragraph or sort inside braces etc. So, after some searching online and digging the Vim documentation, I have the following in my vimrc:

" Sort lines, selected or over motion.
xnoremap <silent> gs :sort i<CR>
nnoremap <silent> gs :set opfunc=SortLines<CR>g@
fun! SortLines(type) abort
    '[,']sort i
endfun

With this, hitting gsip would sort the lines inside the current paragraph. Similarly, gsiB would sort lines inside the braces closest to the cursor (try this one in CSS!). If you have the vim-indent-object plugin, you could also do gsii to sort lines in current indent block.

Additionally, we also have an xnoremap mapping definition which lets us use gs in visual mode to sort the highlighted lines. I don’t use this as often as the operator version above, but it’s nice to have nonetheless.

Reversing over Motion¶

This is very similar to the above. Instead of sorting, I’m reversing the lines. Unfortunately, we don’t have a :reverse command like :sort, so this one is more DIY.

" Reverse lines, selected or over motion.
nnoremap <silent> gr :set opfunc=ReverseLines<CR>g@
vnoremap <silent> gr :<C-u>call ReverseLines('vis')<CR>
fun! ReverseLines(type) abort
    let marks = a:type ==? 'vis' ? '<>' : '[]'
    let [_, l1, c1, _] = getpos("'" . marks[0])
    let [_, l2, c2, _] = getpos("'" . marks[1])
    if l1 == l2
        return
    endif
    for line in getline(l1, l2)
        call setline(l2, line)
        let l2 -= 1
    endfor
endfun

I mapped reversing to gr, which works similar to the gs from previous section, but instead of sorting, the lines will be reversed. Everything in the above snippet can be looked up with :h command within Vim. I’ll leave the understanding-it’s-working part as an exercise to the reader, if inclined.

Conclusion¶

This articles looks an awful lot like a list of Vim tips, but I implore you to see further. I picked these specific things from my Vim setup (which is a lot bigger than this) to illustrate the idea of identifying and then automating. Of course, these snippets I shared above, in my opinion are too small for a full blown plugin, yet not too insignificant to not be shared. I intend to follow up with more ideas from my configuration, so stay tuned.

I also encourage you to go over the Vim help pages often. They contain some awesome tips and ideas that serve as great starter points to improve your workflow. So, just, you know, while that really long build is running, grab a coffee and open the Vim docs!

Identify, optimize, repeat.

Read the next article in this series.

Python's `map` builtin function

2020-01-04T00:00:00+05:30

In this article, we’ll take a look at Python’s stream processing utility function, map. This function can enable us to write powerful list/stream-processing routines that can be easy to read and understand.

Let’s go over the basics first so we have context when talking about them.

Syntax¶

Calling map:

# From official docs
map(function, iterable, ...)

Where

function: Called with each item from iterable.
iterable: Use to take inputs for calling function.
returns: Iterable of return values from calling function.

Working of `map` Function¶

Here’s a run-book for the map builtin function:

Accepts two arguments, a function (or any callable) and a list (or any sequence) of objects.
Call the function once per object in the list, pass the object to the function, and collect the return value from each call.
Return a generator that will yield the return values as collected by applying above step over and over until the list from point 1 is exhausted.

Note that in Python 2, map used to return a list object. However, in Python 3, it returns a map object which is a generator that lazily processes each item in the list as they are needed. If you don’t want to bother with this difference for now, remember to always wrap the result of a map function with a list. The official 2to3 tool handles this automatically.

Let’s look at some examples:

>>> map(str, range(5))
<map object at 0x0000000002DCD3C8>
>>> list(map(str, range(5)))
['0', '1', '2', '3', '4']

Notice how in the first call to map, we get a map object show up in the result. In this case, none of the items in range(5) have been processed by str. But when we wrap it in list the next time, we get the list of all processed items.

We can also pass in lambda functions just fine.

>>> list(map(lambda x: x**2, range(5)))
[0, 1, 4, 9, 16]

But don’t do that, that’s silly. We’ll see why later down in this article, but, put simply, comprehensions are almost always better than a map+lambda combination.

Additionally, map can also take more than one sequence in it’s arguments. In that case, the items produced by each of the other sequence make up for additional arguments for the given function.

Consider the following call to map:

list(map(sum, [1, 2, 3], [7, 8, 9], [100, 200, 300]))

This will call the given sum function three times,

sum(1, 7, 100)
sum(2, 8, 200)
sum(3, 9, 300)

It produces a result list of three items, the three return values of the above three calls to sum.

Let’s look at some useful ways we can use the map function in real world code.

Using Unbound Methods¶

If the function we want to call is a method call on each object in the given list, we could use a comprehension or do it with map+lambda like this:

>>> protocols = ['http', 'tcp', 'xmpp', 'irc']
>>> [protocol.upper() for protocol in protocols]
['HTTP', 'TCP', 'XMPP', 'IRC']
>>> list(map(lambda protocol: protocol.upper(), protocols))
['HTTP', 'TCP', 'XMPP', 'IRC']

But a much simpler way, is to provide the unbound method as the first argument to map.

>>> list(map(str.upper, protocols))
['HTTP', 'TCP', 'XMPP', 'IRC']

The reason this works is because calling unbound method with an instance as the first argument, is almost the same thing as calling the bound method of that instance. In other words, str.upper('http') is more or less the same as 'http'.upper(). This is true for any method on any class (even classmethods if you have a list of classes).

More Types of Sequences¶

Pass in sets, dictionaries (also mydict.get as function), file objects, a string (map(ord, 'abc')) etc.

The second argument to map can be any sequence data type, doesn’t have to be a list. Here’s some types that are quite useful with map:

Sets (function called with each item in set)
Dictionaries (function called with each key in the dictionary)
Files (function called with each line in the open file object)
Strings (function called with each character in the string)

We can use dictionaries as the sequence to run a function over each key in the dictionary. Additionally, we can use the .items or .values to have map run the function over each (key, value) tuple or just the values respectively.

>>> numbers = {'one': 1, 'two': 2, 'three': 3, 'four': 4}
>>> list(map(len, numbers))
[3, 3, 5, 4]
>>> list(map(str, numbers.values()))
['1', '2', '3', '4']
>>> list(map(repr, numbers.items()))
["('one', 1)", "('two', 2)", "('three', 3)", "('four', 4)"]

We can use map to transform the lines of a file as we are reading over it. This is actually very useful to do some small preprocessing on the lines, like removing trailing white space.

with open('contents.txt') as open_file:
    for line in map(str.rstrip, open_file):
        pass

We can map a function like ord (returns the Unicode code point for a single character) over a string, to get the code points for each character in the string.

>>> list(map(ord, 'aluminium'))
[97, 108, 117, 109, 105, 110, 105, 117, 109]

Dictionaries as Transformers¶

This is another neat trick where we have a dictionary and a list of some keys. We use map to transform the list of keys to a list of values, referring to the dictionary.

>>> numbers = {'one': 1, 'two': 2, 'three': 3, 'four': 4}
>>> keys = ['three', 'four', 'two', 'five', 'four', 'two']
>>> list(map(numbers.get, keys))
[3, 4, 2, None, 4, 2]

Notice that when faced with a key like 'five' that doesn’t exist in the dictionary, we get None, which is how the dict.get behaves.

Note that in this call to map, we are passing a bound method, numbers.get. This is essentially the dict.get unbound method, which has been bound to the dict instance we are calling numbers.

Infinite Sequences¶

Since map is lazy from Python 3, it can work with infinite sequences just fine. For our purposes, let’s create a generator that will generate positive even numbers from zero to infinity:

>>> def positive_evens():
...     n = 0
...     while True:
...         yield n
...         n += 2

Since this generator never stops by itself, calling list(positive_evens()) will never return. So, we have to put a cap on the amount of data we generate ourselves. Of course, map doesn’t care.

>>> for e in positive_evens():
...     if e > 3:
...         break
...     print(e)
...
0
2
>>> import math
>>> for e in map(math.sqrt, positive_evens()):
...     if e > 3:
...         break
...     print(e)
...
0.0
1.4142135623730951
2.0
2.449489742783178
2.8284271247461903

The map function doesn’t care that the generator we passed in is never ending. It only processes as many items as the for loop requests.

Be careful with infinite generators though, it’s very easy to end up in an infinite loop situation.

Side Effect Operations¶

The map function is best used as a transformation done to each item in a sequence. In this sense, the function that’s passed in is usually a pure function. Passing in functions that are purely intended for side effects (like print, log.debug etc.) is in bad taste (opinion alert!).

This is mostly because of two reasons. First, we’ll have to pass the return value of map to list to get our print calls to run. Second, we’ll then have a list of Nones that’s just a sad waste.

>>> list(map(print, protocols))
http
tcp
xmpp
irc
[None, None, None, None]

The better way to do this is to just use a for loop and make the intention clear. The intention is to do something with each item in the sequence. Not to do something to each item in the sequence and collect their return value.

>>> for protocol in protocols:
...     print(protocol)
http
tcp
xmpp
irc

Much better.

String join¶

Since we can use bound methods with map as well, we can pass in methods bound string methods like str.join:

>>> planets = {'one': 'un', 'two': 'deux', 'three': 'trois'}
>>> list(map(':'.join, planets.items()))
['one:un', 'two:deux', 'three:trois']

Case Against `lambda`+`map`¶

Since map accepts any callable, it can be tempting to use lambda functions inside map. This is almost always bad taste, and usually, comprehensions (along with zip) offer a more readable alternative.

Consider the following use of map with lambda:

>>> list(map(lambda x: x * 2, range(5)))
[0, 2, 4, 6, 8]

Now compare that with the same thing done with a comprehension:

>>> [x * 2 for x in range(5)]
[0, 2, 4, 6, 8]

Now, of course we can use comprehensions even if we are not using lambda in map by just calling it in the comprehension, true, but in that case, map just looks prettier ;).

>>> [ord(c) for c in 'hello']
[104, 101, 108, 108, 111]
>>> list(map(ord, 'hello'))
[104, 101, 108, 108, 111]

In fact, any call to map can be translated to a comprehension:

map(function, iterable, ...)
# same as
(function(*vals) for vals in zip(iterable, ...))

But that doesn’t mean map is not useful. We just have to pick the right option depending on the need.

Conclusion¶

The map function is powerful builtin, but should be used with care. If you find yourself nesting several different calls to map, you may want to rethink that strategy since it quickly becomes unreadable.

But when it produces clear-to-understand code, map can be very useful tool.

Thank you for reading! Do you have any clever examples of using map? Share in the comments!

The Jython Pillow Guide

2018-01-09T00:00:00+05:30

This is a document with tips and usage details about Jython that I’ve come across. I intend to document handy features of Python as well as some clever inter-op facilities provided by Jython.

I’m going to assume you’re not a complete beginner to Java and Python languages. If you find anything off or have a suggestion to add, please do write to me. Thanks!

Logging and Printing¶

When using Apache’s log4j, we can get an instance of a Logger using the API just as we would in Java:

>>> from org.apache.log4j import Logger
>>> log = Logger.getLogger('jython_script')

When getting a Logger instance for a module that is imported, a logger with a category specific to that module can be obtained using the following code:

log = Logger.getLogger(__name__)

The __name__ name is a variable containing the current module’s name as a string. Note that __name__ is set to the string '__main__' if the module is run as a script and not imported from another script. This should be kept in mind when using the above code.

The standard printing functions of Java can be imported into Python and used directly in the following way:

>>> from java.lang import System
>>> System.out.println('Hola')
Hola
>>> System.err.println('Hello there')
Hello there
>>> System.out.print('Hola\n')
Hola

However, it’s usually more convenient to use Python’s print statement to output things to standard output and error:

print 'Hello world!'

Here’s a table illustrating the print statement equivalents of the Java print* functions:

Java	Python
`System.out.println("!")`	`print '!'`
`System.out.print("!")`	`print '!',`
`System.err.println("!")`	`print >> sys.stderr, '!'`
`System.err.print("!")`	`print >> sys.stderr, '!',`

Bean Properties¶

Jython can implicitly call the .get* and .set* methods that are widely used in Java classes to get and set the values of instance attributes. Here’s an illustration of how this inter-op works:

Jython	Java equivalent
`obj.somePropertyValue`	`obj.getSomePropertyValue()`
`obj.somePropertyValue = 123`	`obj.setSomePropertyValue(123)`

Of course, when such .get* and .set* methods are not available, this falls back gracefully to trying get/set the property values directly, just as Java would treat those statements.

Strings¶

Strings in Java (i.e., objects of type java.lang.String) are converted to unicode objects when passed in to Python world. Whereas str and unicode objects in Python are converted to java.lang.String instances when passed in to Java world. This conversion is seamless and we usually don’t have to worry about it.

However, if needed, we can explicitly create an instance of java.lang.String from a unicode object in Python:

>>> from java.lang import String
>>> greeting = String('Hello')
>>> greeting
Hello
>>> type(greeting)
<type 'java.lang.String'>

String formatting using % operator in Python cannot be applied to Java String objects. They have to converted to str or unicode first.

Maps as Dictionaries¶

For the purposes of the following examples, let’s work with the following Map:

java.util.Map<String, Integer> data = new java.util.HashMap<>();
data.put("a", 1);
data.put("b", 2);
data.put("c", 3);

Maps support the getitem syntax very well so it is usually convenient to think of them as python-style dictionaries. Here’s an example:

>>> print data['a']  # data.get("a")
1
>>> print data['b']  # data.get("b")
2
>>> data['d'] = 4  # data.put("d", 4)
>>> data['d']  # data.get("d")
4
>>> len(data)  # data.size()
4
>>> 'c' in data  # data.containsKey("c")
True
>>> del data['c']  # data.remove("c")
>>> 'c' in data  # data.containsKey("c")
False
>>> data
{a=1, b=2, d=4}
>>> len(data)  # data.size()
3

Although this resembles the usage of a traditional python dictionary, the methods you’d expect in a dictionary are not all available. This is a Map object after all and it has the methods of the Map class. However, it is easy to get see the parallels among some of the most used methods.

`dict` method	`Map` method
`.keys`	`.keySet`
`.values`	`.values`
`.clear`	`.clear`
`.items` (gives 2-tuples)	`.entrySet` (gives `Entry` objects with `.key` and `.value`)
`.update`	`.putAll` (accepts `dict` as well as a `Map`)

The dict builtin can be called on the Map object to get a python-style dictionary, if needed. Additionally, just like a python dictionary, calling list (or set) on the Map object gives a list (or set) of the keys in the Map.

Using for loops to iterate over Maps yields the keys in the Map, which is consistent with how for loops work with python dictionaries.

for key in data:
    print key, data[key]

Prints the following:

a 1
b 2
d 4

In python, the .items method returns each entry as a tuple which lets us write the for loop like the following:

# !!! Only works if `data` is a python-style dictionary, not if it is a `Map`.
for key, value in data.items():
    print key, value

But unfortunately, since Map doesn’t have the .items method, this is not possible. However, we can use the .entrySet method to construct something slightly similar.

for entry in data.entrySet():
    print entry.key, entry.value

To iterate over the values of a Map, since the method is called .values in both dict and Map, the same piece of code would work with any object.

for value in data.values():
    print value

Empty Map objects are treated as False in boolean contexts, just as with python’s dictionaries.

Collections¶

The two main collection types in Python are list and set. The equivalents in java are the interfaces List and Set. Let’s prepare some data for our examples.

java.util.List<String> planets = new java.util.ArrayList<>();
planets.add("Mercury");
planets.add("Venus");
planets.add("Earth");

java.util.Set<String> colors = new java.util.HashSet<>();
colors.add("White");
colors.add("Black");
colors.add("Red");
colors.add("Green");
colors.add("Blue");

The getitem syntax can be used with Lists seamlessly:

>>> planets[0]
u'Mercury'
>>> planets[1]
u'Venus'

The slicing syntax, returns Lists of the same type, not python-style lists.

>>> planets[:2]
[Mercury, Venus]
>>> type(_)  # `_` is a variable set to the return value of last expression.
<type 'java.util.ArrayList'>
>>> planets[::-1]
[Earth, Venus, Mercury]
>>> type(_)
<type 'java.util.ArrayList'>

However, the getitem syntax is not supported for Sets as it doesn’t make sense there since Sets are unordered collections. But the operator support available for sets in python are available with Java Set objects as well.

>>> 'Red' in colors
True
>>> len(colors)
5

The for loop can be used on any Collection type objects to iterate over the object’s contents.

>>> for x in planets:
...     print x
...
Mercury
Venus
Earth
>>> for x in enumerate(planets):
...     print x
...
(0, u'Mercury')
(1, u'Venus')
(2, u'Earth')

Here’s equivalents for some of the methods available in Java’s Collections and Python’s collection types.

Java	Jython
`Collection.add`	`list.append` / `set.add`
`Collection.addAll`	`list.extend` / `set.update` (Prefer `list + list` or `set.union`)
`Collection.contains`	`in list` or `in set`
`Collection.isEmpty`	`bool(list)` or `bool(set)` (Can be used directly in a boolean context)
`Collection.size`	`len(list)` or `len(set)`

Empty Collections are treated as False in boolean contexts, just as with python’s collections.

Java Arrays¶

Just as Java’s List is mirrored in Python with list, Java’s arrays are mirrored using the array structure available in Jython’s array module. That official documentation is quite exhaustive on this topic, so I suggest going over it to get an idea of handling arrays in Jython.

The Iteration Protocol¶

Java’s Iterator style iteration is supported by Jython’s for statements. For example, consider the following Java Iterator that’s trying to emulate a small fraction of Python’s range function:

package ssk.experiments;
import java.util.Iterator;

public class RangeIterator implements Iterator<Integer> {
    private Integer current = 0, max;
    public RangeIterator(int max) { this.max = max; }
    @Override
    public boolean hasNext() { return current < max; }
    @Override
    public Integer next() { return current++; }
}

Since classes are instantiated without a new keyword in Python, combined with the fact that Jython’s for statement supports Java’s Iterators, we can use the above in the following way:

from ssk.experiments import RangeIterator


for n in RangeIterator(5):
    print n

This gives the following output:

Since Jython’s for statement supports iterating over Java’s Enumeration type, the above same for loop would work with a RangeEnumeration class as defined below:

package ssk.experiments;
import java.util.Enumeration;

public class RangeEnumeration implements Enumeration<Integer> {
    private Integer current = 0, max;
    public RangeEnumeration(int max) { this.max = max; }
    @Override
    public boolean hasMoreElements() { return current < max; }
    @Override
    public Integer nextElement() { return current++; }
}

Jython seamlessly handles the getting of an instance of an Iterator from a Java Iterable. This is actually how the for statement works with the List and Set collections discussed earlier (Collection is a sub-interface of Iterable).

Patching Java Classes¶

In Python, new methods and attributes can be added to existing classes. This comes from the dynamic nature of the programming language and the runtime. The JVM is also a dynamic runtime, but the Java language doesn’t allow us to modify existing classes. This is where Jython comes in. Jython lets us add and override methods on existing Java classes. Although this is seldom needed, this can illustrate the extent of Jython’s integration with the JVM.

Here’s a Java class:

package ssk.experiments;
import java.util.List;

public class Country {
    private String name;
    public Country(String name) { this.name = name; }
    public String getName() { return name; }
    public void setName(String name) { this.name = name; }
}

There’s nothing fancy with the above class. It’s a regular class with one property with a .get and .set methods. Now, let’s add a new method to this class.

from ssk.experiments import Country


def upcase(self):
    self.name = self.name.upper()


Country.upcase = upcase

# Create a `Country` object and call `upper_name` method.
largest_country = Country('Russia')
largest_country.upcase()
print largest_country.name

This would print RUSSIA, as expected.

Note that this is an advanced feature and should be used with caution. In almost all cases, it is probably a better idea to modify the original Java class definition directly. But when that is not an option, creating a simple Python function that works with these objects should be considered. Modifying existing classes should only be used as a last resort.

Operator Overloading¶

One nice and practical case for adding methods on existing Java classes is to leverage Python’s support for operator overloading with Java classes. One good example for this is with the BigDecimal class. Mathematical operations on objects of BigDecimal are provided as individual methods like .add, .subtract etc. We can add operator support (in Jython) for these objects by adding the appropriate methods to the BigDecimal class.

For instance, here’s how we can add support for the + operator:

from java.math import BigDecimal

BigDecimal.__add__ = lambda self, other: self.add(other)

print BigDecimal(42) + BigDecimal(10)

This would print 52, as expected. More methods can be added to support all the mathematical operators such as __sub__ for subtraction and __mul__ for multiplication etc. The full list of such method names can be found on the official data model documentation page.

Conclusion¶

This is not intended to be an exhaustive guide to what Jython can do. I hoped to give you a taste of how well Jython handles inter-op with Java and hopefully I’ve helped you write better Python - Java inter-op code. Thank you and any suggestions and feedback are very welcome.

The Python Dictionary

2017-09-29T00:00:00+05:30

The Python Dictionary is a key–value style data structure that is tightly integrated with the language syntax and semantics. Understanding them well can help us use them better and investigate subtle problems more efficiently.

This is my attempt to document this topic in more depth. Though I included a small section about the syntax and basic usage of dictionaries, it’ll be helpful if you have some beginner–intermediate level experience with Python.

This article is written for Python 3.6 installed via Anaconda on Xubuntu. Here’s the platform details:

$ python -V
Python 3.6.1 :: Anaconda custom (64-bit)
$ uname -isro
Linux 4.10.0-33-generic x86_64 GNU/Linux

Note: This is not intended as a substitute for official documentation. The official documentation is a reference and there will be some overlap. This document is intended as a supplement that covers more depth and practical nuances.

Introduction¶

Dictionaries (type dict) are a very powerful data structure, not just in Python. They are present in almost every modern high level language, sometimes called maps, hashes or associative arrays. Python’s syntax for dictionaries inspired the syntax of the JSON serialization format.

Dictionaries are a fundamental part of Python language and integrate tightly with the semantics and APIs of the standard library. This can be seen in the fact that we have a special syntax just to create these data structures.

Usage¶

Syntax¶

As a quick primer, here’s the syntax for defining a dictionary:

country_currencies = {
    'India': 'Rupee',
    'Russia': 'Ruble',
    'USA': 'Dollar',
    'Japan': 'Yen',
}

API¶

Again, we quickly run down the common operations on dictionaries.

# Get the value of a key.
indian_currency = country_currencies['India']

# Set the value of a key.
country_currencies['France'] = 'Euro'

# Delete a key.
del country_currencies['USA']

# Check for presence of a key.
'Russia' in country_currencies

# Get if key present, otherwise return `None`.
# (Takes a second parameter which is returned when key is missing).
country_currencies.get('USA')

# Set only if the key is not already present.
country_currencies.setdefault('France', 'Franc')

Contents¶

The contents of dictionaries are made up two components. The keys and the values. The keys form the index using which we can retrieve the values. Each key uniquely identifies a value within the dictionary.

Key Types¶

The keys form the index of the dictionary. In most practical cases, keys tend to be strings. Tuples are often used as well. In fact, values of any immutable, hashable types can be used as keys.

So, what is a hashable type? The official documentation of the __hash__ method gives the full detail of what it is and what are considered hashable. Simply put, if passing an object to the hash builtin function doesn’t raise an exception, the object is hashable and can be used as a key in a dictionary.

However, in practice, we should avoid using mutable objects as keys (even if they are hashable). Especially, if mutation changes the hash of the object.

For example, consider the following User class.

class User:
    def __init__(self, first_name, last_name):
        self.first_name = first_name
        self.last_name = last_name

Let’s inspect the hash values of User objects.

>>> ned = User('Ned', 'Stark')
>>> hash(ned)
8784834659087
>>> ned.first_name = 'Robb'
>>> hash(ned)
8784834659087

If you try the above code, you might see a different number. That’s because Python default hashing algorithm includes a random salt.

As seen above, the hash value did not change even though the object was modified. These User objects can be used as keys for a dictionary since they meet the requirement, but it should be kept in mind that they are mutable.

>>> ned = User('Ned', 'Stark')
>>> d = {ned: 123}
>>> d[ned]
123
>>> ned.first_name = 'Robb'
>>> d[ned]
123

If that doesn’t seem confusing, try this:

>>> robb = ned
>>> ned = User('Ned', 'Start')
>>> robb.first_name
'Robb'
>>> robb in d  # Robb isn't in our dictionary!
True
>>> ned.first_name
'Ned'
>>> ned in d  # We gave Ned Stark a value right?
False

This can quickly cause headaches and hard-to-find problems.

To fix this, if someone later decides to customize the hashing of this class by adding the following method:

    def __hash__(self):
        return hash((self.first_name, self.last_name))

Now, the hash of the object changes when we change the first_name.

>>> ned = User('Ned', 'Stark')
>>> hash(ned)
4091961891370636651
>>> ned.first_name = 'Robb'
>>> hash(ned)
-7890115541605828979

Using these objects as keys can be confusing as well:

>>> ned = User('Ned', 'Stark')
>>> d = {ned: 123}
>>> d[ned]
123
>>> ned.first_name = 'Robb'
>>> d[ned]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: <__main__.User object at 0x7fd60e7c5828>
>>> ned.first_name = 'Ned'
>>> d[ned]
123

In essence, using mutable types as keys in a dictionary can lead to confusing results in a fairly large codebase.

So, to avoid these potential problems, it’s best to use numbers, strings or tuples (containing numbers or strings) as keys for dictionaries. If you have to use other types, keep the hashing semantics in mind and document the reasons well.

Retrieving Keys¶

Dictionaries have a .keys method that returns an object of type dict_keys which is an iterable (technically, a view) of the keys of the dictionary. Note that this method used to return an ordinary list in Python 2.

>>> countries = country_currencies.keys()
>>> countries
dict_keys(['India', 'Russia', 'USA', 'Japan'])
>>> import collections
>>> isinstance(countries, collections.Iterable)
True

Note that the order of the keys is not retained/defined. Don’t rely on the order even if they seem predictable. It might vary across Python implementations and versions even. Use an OrderedDict when ordering is needed. More on this in a later section.

It should be noted that starting in Python 3.6, order of keys is preserved. This is an unintended side affect of using a more efficient dict implementation. As such, the Python documentation explicitly states that this is an implementation detail and should not be relied upon. Read more.

So, what’s special about dict_keys, as opposed to a list? Look look!

>>> countries
dict_keys(['India', 'Russia', 'USA', 'Japan'])
>>> country_currencies['France'] = 'Euro'
>>> countries
dict_keys(['India', 'Russia', 'USA', 'Japan', 'France'])

See? The dict_keys object is a view of the keys of the original dictionary object. When the dictionary’s keys change, so does the keys view. Of course, we can make a set of currently available keys by passing it to set builtin. This set would be independent of the dictionary.

>>> set(countries)
{'Japan', 'USA', 'Russia', 'India', 'France'}

Most people would suggest and use list here, instead of set. I personally feel a set is semantically more correct since a list indicates the contents have a specific ordering and does not convey that the contents are hashable, immutable, and more importantly, unique. A set shares these features of dictionary keys.

Additionally, the dict_keys objects are themselves set-like. They implement the Set abstraction. So, we don’t need to convert them to a set in order to do set operations on them. For example, here’s an intersection operation:

>>> isinstance(countries, collections.abc.Set)
True
>>> countries & {'India', 'China'}
{'India'}

Using Tuples for Keys¶

Here’s a quick example of using tuples as keys in a dictionary:

>>> data = {
...     ('a', 1): 'a1',
...     ('a', 2): 'a2',
...     ('b', 1): 'b1',
...     ('b', 2): 'b2',
... }
>>> data['a', 2]
'a2'

Note that only tuples that contain hashable types (or further such tuples) can be used as keys. Lists or dictionaries, on the other hand, cannot be used since they are not hashable.

Retrieving Values¶

Values are what the keys index. Naturally, values don’t have to be unique, unlike keys. There’s no restrictions on what types can be used as values in a dictionary.

We can get a sequence of values in a dict with the .values method. This returns a dict_values object.

>>> currencies = country_currencies.values()
>>> currencies
dict_values(['Rupee', 'Ruble', 'Dollar', 'Euro', 'Yen'])
>>> type(currencies)
<class 'dict_values'>
>>> isinstance(currencies, collections.abc.Set)
False

This is live as well!

>>> del country_currencies['France']
>>> currencies
dict_values(['Rupee', 'Ruble', 'Dollar', 'Yen'])

This can be passed to list to get a list of values. Using set here is probably not a good idea since unlike the keys, values don’t have to be unique or hashable.

Items Collection¶

Dictionaries also provide a .items method that returns all the key–value pairs as a sequence of 2-tuples.

>>> pairs = country_currencies.items()
>>> pairs
dict_items([('India', 'Rupee'), ('Russia', 'Ruble'), ('USA', 'Dollar'), ('Japan', 'Yen')])

Again, just like with .keys or .values, the sequence is live and the order of items is not defined.

The .items method is probably mostly used with the for statement to loop over the key–value pairs.

for country, currency in country_currencies:
    print(f"{country}'s currency is {currency}.")

The above code uses f-strings introduced in Python 3.6. In older versions of Python, the .format method or the modulo (%) operator should be used.

The dict_items object also implements the Set abstraction.

>>> isinstance(pairs, collections.abc.Set)
True

However, the abstraction’s methods only work if the dictionary’s values are hashable, not just the keys. So, for the dictionary we are working with, the pairs object can be used as a set.

>>> pairs & {('India', 'Rupee'), ('UK', 'Pound')}
{('India', 'Rupee')}

But if we try this on a dictionary whose values are not hashable, say, lists, then it fails.

>>> number_types = {
...     'even': [2, 4, 6, 8],
...     'odd': [1, 3, 5, 7, 9],
... }
>>> pairs = number_types.items()
>>> pairs
dict_items([('even', [2, 4, 6, 8]), ('odd', [1, 3, 5, 7, 9])])
>>> isinstance(pairs, collections.abc.Set)
True

Let’s try intersecting this with an empty set.

>>> pairs & set()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

As the error says, list is not hashable. So, although the isinstance tells us that this is a Set, whether it can actually be used as such, depends on it’s contents. This is not incorrect, actually, I feel it’s just a consequence of Python’s dynamic nature.

Typing¶

Dictionaries in Python are what I call a homogeneous data structure. What that means is that they are best used by having all the keys be of the same type and similarly for values. This is enforced in comparable data structures in statically typed languages like Java’s Map or Haskell’s HashMap. But since Python is a dynamic language, such restrictions are not placed. We can have keys / values of several different types within the same dictionary.

data = {
    'a': 1,
    42: 'yay!',
    ('a', 'b', 2): True,
}

This is still a valid dictionary, although an extremely sad and ugly one (totally my opinion :D).

If using homogeneous dictionaries, the type annotations syntax can be used to declare the type signatures. We use typing.Dict for this purpose as illustrated below.

from typing import Dict, Tuple

number_map: Dict[int, int] = {1: 10, 2: 20, 3: 30}
data_map: Dict[Tuple[str, int], str] = {('a', 1): 'a1', ('a', 2): 'a2'}

This is new in Python 3.6. Before 3.6, annotations are only supported for function arguments. Read more.

Additionally, the typing module itself is new in Python 3.5. Read more.

The general structure is Dict[<key-type>, <value-type>]. So, Dict[str, int] denotes a dictionary that maps string keys to integer values.

Note that these type annotations are not checked at runtime. They’re mere help to IDEs, static checkers and human readers. Python’s dynamic nature is not affected by these annotations.

However, if such type annotations are declared, you could use a static analyzer like mypy to perform type checks. I won’t be discussing that here.

Creating Dictionaries¶

There are a few other ways to create dictionaries besides the {} syntax. Here’s a few of them.

Calling `dict`¶

The dict callable can be used to create dictionaries from a list of tuples or bypassing the keys and values as keyword arguments.

>>> dict([('Chromium', 24), ('Phosphorus', 15), ('Silver', 47)])
{'Chromium': 24, 'Phosphorus': 15, 'Silver': 47}

This is obviously more convenient than the dictionary syntax only if we already have such a list. If we have the keys and corresponding values in different lists, we can zip them up and pass the result to dict.

>>> dict(zip(
...     ['Sulfer', 'Calcium', 'Gold'],  # Keys
...     [16, 20, 79],  # Values
... ))
{'Sulfer': 16, 'Calcium': 20, 'Gold': 79}

Of course, we can pass keyword arguments directly to dict, in addition to the above even.

>>> dict(dict([('Chromium', 24), ('Phosphorus', 15)]), Sodium=11, Nitrogen=7)
{'Chromium': 24, 'Phosphorus': 15, 'Sodium': 11, 'Nitrogen': 7}
>>> dict(Sodium=11, Nitrogen=7)
{'Sodium': 11, 'Nitrogen': 7}

The second form is better written using the Python syntax. That is more natural to a potential future reader, and, slightly faster¹ as well.

Comprehensions¶

Python 3 (and 2.7) added support for dict comprehensions which are very similar to list comprehensions, but with a small variation in syntax.

>>> dict((i, i**2) for i in range(5))  # Using the `dict` builtin.
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
>>> {i: i**2 for i in range(5)}  # Using a dict comprehension.
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

The above two examples create the same dictionary. However, as pointed out in PEP 274, the dict comprehension is more succinct and makes the intent clearer.

Public Appearance¶

Unsurprisingly, dictionaries pop up in a lot of places in Python. Here’s a few ones.

Keyword Arguments¶

When defining a function that takes arbitrary keyword arguments, they are passed to the function as a dictionary.

>>> def construct(**counts):
...     print(counts)
...     print(len(counts), type(counts))
...
>>> construct(a=1, b=2, c=3)
{'a': 1, 'b': 2, 'c': 3}
3 <class 'dict'>

Of course, we can pass a dictionary’s data as keyword arguments to a function using similar syntax.

>>> kw_args = {'a': 1, 'b': 2, 'c': 3}
>>> construct(**kw_args)
{'a': 1, 'b': 2, 'c': 3}
3 <class 'dict'>

Namespaces¶

The globals builtin function gives a dictionary of all names and their values in the current global namespace. We can modify this dictionary to define new names or delete existing ones, although that’s probably a bad idea.

>>> len(globals())
25
>>> globals()['x'] = 123
>>> x
123
>>> del globals()['x']
>>> x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined

The locals builtin returns a dictionary of names and values from the local scope, for e.g., the private local scope inside of a function or method.

The vars builtin takes an object as an argument and returns the names available as properties on this objects. Specifically, it returns the __dict__ property’s value of the given object. When called without any arguments, then it returns names and values from the local scope. In other words, vars() is locals() return True.

Serialization¶

Dictionaries, being key–value data structures, extend naturally to be stored into key–value databases and other NoSQL data stores. However, here we’ll look at forms of serializing them into text and binary forms for transmission or for saving to disk.

JSON¶

Nowadays, the thought of serializing a python dictionary is usually followed by using the json module to dump and load using the JSON format. No surprise since it’s extremely convenient and there’s quality parsers and writers for almost every programming language today. The syntax as well, although not too convenient to write by hand, is still very simple, lightweight and easy to read. It helps that the syntax is quite close to Python’s own syntax for dictionaries.

Here’s a quick example:

>>> import json
>>> json.dumps(country_currencies)
'{"India": "Rupee", "Russia": "Ruble", "USA": "Dollar", "Japan": "Yen"}'
>>> json.loads(json.dumps(country_currencies)) == country_currencies
True

In short, these four functions from json module are enough to know the basic usage.

Method	Description
`.dump(obj, fp)`	Turn `obj` into JSON and write it to the `fp` file-like object.
`.dumps(obj)`	Turn `obj` into JSON and return the resulting string.
`.load(fp)`	Read valid JSON from `fp` file-like object and return the resulting object.
`.loads(data)`	Parse `data` as a valid JSON string and return the resulting object.

As convenient as this is, it is important to know the changes to data types that will result because of this. JSON only supports numbers, strings and booleans as primary data types and arrays & maps as analogues to lists and dicts. As a result of this, if there are tuples somewhere in the dictionary, then they will be turned into lists when the dict is serialized and deserialized with JSON. A similar situation occurs for dates and any other data type not directly supported by the JSON spec.

Pickling¶

Unlike the above, pickling (using the pickle module) serializes objects into binary data and can handle a much wider range of data types. For this reason, pickled data can only be loaded by Python, not other languages (well, not yet at least).

The pickle module has similar dump, dumps, load and loads methods just like for the above discussed json module.

The Item Syntax¶

The syntax used to get an item from a dictionary, given it’s index, is data[key]. This is mostly equivalent to calling the __getitem__ method, like the following:

data.__getitem__(key)

But obviously, we’d prefer the square bracket syntax. But understanding that underneath the syntax, it’s just a method call, lets us implement the __getitem__ method in our own classes and get the item syntax on our objects.

Here’s a simple example:

>>> class Store:
...     def __getitem__(self, name):
...             return name.upper()
...
>>> store = Store()
>>> store['Hello there!']
'HELLO THERE!'

Similar to this is the __setitem__ which is used to set the value using the item syntax.

# The following two are equivalent.
data[key] = value
data.__setitem__(key, value)

Note that this should be used responsibly. This feature gets into borderline operator overloading category. In almost all cases (including the above example), using a normal named method on your classes should be a better option than overriding the item syntax. Since a normal method would have a name which makes the intent clearer.

Flavors¶

Python’s standard library comes with a few flavors of dictionaries that provide some nice additional functionality. These data structures are all available in the collections module.

The following are subclasses of dict and have all the features of Python’s dictionaries.

The `OrderedDict`¶

The collections.OrderedDict is a dictionary that remembers the order in which keys are inserted. The order remembered is the insertion order. So, if we add a new key to the dict, it will be at the end of the key sequence. But if we change the value of an existing key, it’s position in the ordering is unchanged.

Create a new OrderedDict:

>>> from collections import OrderedDict
>>> planet_satellites = OrderedDict(
...     Mercury=0,
...     Venus=0,
...     Earth=1,
...     Mars=2,
...     Jupiter=69,
...     Saturn=62,
...     Uranus=27,
...     Neptune=14,
... )
>>> from pprint import pprint
>>> pprint(planet_satellites)
OrderedDict([('Mercury', 0),
             ('Venus', 0),
             ('Earth', 1),
             ('Mars', 2),
             ('Jupiter', 69),
             ('Saturn', 62),
             ('Uranus', 27),
             ('Neptune', 14)])

Note that we use the pprint function to show the OrderedDict objects in a convenient way.

They are just dictionaries under the hood.

>>> isinstance(planet_satellites, dict)
True
>>> planet_satellites['Mars']
2

These objects support being reversed as well:

>>> rev_planets = OrderedDict(reversed(planet_satellites.items()))
>>> pprint(rev_planets)
OrderedDict([('Neptune', 14),
             ('Uranus', 27),
             ('Saturn', 62),
             ('Jupiter', 69),
             ('Mars', 2),
             ('Earth', 1),
             ('Venus', 0),
             ('Mercury', 0)])

The results of .keys and .values methods also retain the ordering. Refer to the official documentation linked above for full details.

The `defaultdict`¶

The name defaultdict is unfortunate as it doesn’t adhere to any naming conventions. I’d love to see it renamed to default_dict or even DefaultDict, but it’s probably easier to just live with it.

A defaultdict can understand how to initialize new keys. Consider the following code. Here, we have a piece of text and we want a dictionary mapping each letter in the text to it’s count of occurrences.

text = 'lorem ipsum dolor sit amet'
counts = {}
for letter in text:
    if letter not in counts:
        counts[letter] = 0
    counts[letter] += 1
print(counts)

Of course, there’s better ways to do this, but for the sake of example, let’s bear with this implementation.

Notice how we check if the letter is not already present in the dict and if so, we initialize it to zero. A defaultdict can learn this method of initialization. It takes a function as its first argument which returns the value of a new key when accessed. So, we can replace the above code to use defaultdict like:

from collections import defaultdict
text = 'lorem ipsum dolor sit amet'
counts = defaultdict(int)
for letter in text:
    counts[letter] += 1
print(counts)

When we try to get the value of a letter from counts, and that letter doesn’t already exist in counts, defaultdict will call int, with no arguments, and puts the return value into counts[letter]. Precisely what we were doing in our previous example. So, what does int return when called with no arguments? You guessed it, zero!

>>> int()
0
>>> float()
0.0
>>> str()
''
>>> bool()
False
>>> list()
[]
>>> dict()
{}
>>> set()
set()

As illustrated above, calling the data type builtins with no arguments return the falsy value of that data type. We can use this fact and pass these builtins to defaultdict constructor depending on the need. If we wanted a different initial value, say 42, we could use a lambda function like lambda: 42 instead.

The `ChainMap`¶

The ChainMap is an abstraction over a chain of dictionaries in order of precedence. Essentially, it holds a list of dictionaries and when a key is indexed, each of these dictionaries are searched for this key and the value of the first match is returned.

This is better illustrated with an example. Let’s create a ChainMap with dummy data:

>>> from collections import ChainMap
>>> data = ChainMap({'a': 1, 'b': 2, 'c': 3}, {'c': 30, 'd': 40, 'e': 50})
>>> data
ChainMap({'a': 1, 'b': 2, 'c': 3}, {'c': 30, 'd': 40, 'e': 50})
>>> data.maps  # A list of maps in the chain.
[{'a': 1, 'b': 2, 'c': 3}, {'c': 30, 'd': 40, 'e': 50}]

Let’s try indexing:

>>> data['a']
1
>>> data['e']
50
>>> data['c']
3

Here, the 'a' is indexed from the first dictionary, 'e' is indexed from the second dictionary and 'c' is indexed from the first dictionary.

As mentioned in the documentation, writes, updates and deletes, however, operate on the first dictionary alone.

>>> data['a'] = 91
>>> data
ChainMap({'a': 91, 'b': 2, 'c': 3}, {'c': 30, 'd': 40, 'e': 50})
>>> data['e'] = 951
>>> data
ChainMap({'a': 91, 'b': 2, 'c': 3, 'e': 951}, {'c': 30, 'd': 40, 'e': 50})
>>> data['c'] = 93
>>> data
ChainMap({'a': 91, 'b': 2, 'c': 93, 'e': 951}, {'c': 30, 'd': 40, 'e': 50})

Of course, if we explicitly want to modify the last dictionary, it can be indexed directly:

>>> data.maps[-1]['c'] = 999
>>> data
ChainMap({'a': 91, 'b': 2, 'c': 93, 'e': 951}, {'c': 999, 'd': 40, 'e': 50})

The ChainMap is useful to hold tiers of configuration parameters for an application, in a form similar to the following:

ChainMap(user_settings, default_settings)

We can have multiple tiers depending the situation. The user can modify the dictionary as they fit and all writes and updates will be made only on the first dictionary, user_settings. Whereas, when one tries to get the value of a configuration parameter, it automatically falls back to default_settings if it isn’t present in user_settings.

The `Counter`¶

Counter dictionaries can be used to keep counts of any (hashable) objects. The keys are these hashable objects and the values are their counts. The official docs on this gives some clever examples and uses so I recommend you go read this up there, instead of redoing it here.

Custom Flavor¶

Although rarely needed in practice, we can create our own flavors of dictionary types. One way to achieve this would be to extend the dict type directly, but usually the easier way to deal with this is to use the UserDict class.

Here’s an example dictionary type that works with string keys and is case-insensitive. A good use for something like this is for HTTP headers. (The requests library does something similar.)

from collections import UserDict


class CaselessDict(UserDict):

    def __getitem__(self, name):
        return self.data[name.lower()]

    def __setitem__(self, name, value):
        self.data[name.lower()] = value

As seen above, the UserDict class provides a .data attribute that can be used as the underlying store dictionary.

Let’s try it out.

>>> data = CaselessDict(accept='application/json')
>>> data['accept']
'application/json'
>>> data['Accept']
'application/json'
>>> data['ACCEPT']
'application/json'

Disassembling¶

Now, let’s disassemble a few common operations on dictionaries. I won’t be going into the details of how to interpret the disassembled instructions in this article. We use the dis function (from the aptly named dis module) for this.

Let’s try this a very simple function.

>>> dis.dis(lambda: {'a': 1})
  1           0 LOAD_CONST               1 ('a')
              2 LOAD_CONST               2 (1)
              4 BUILD_MAP                1
              6 RETURN_VALUE

Here, we see the BUILD_MAP opcode that takes a count which is the length of the dictionary to build. From the official docs,

Pushes a new dictionary object onto the stack. Pops 2 * count items so that the dictionary holds count entries: {..., TOS3: TOS2, TOS1: TOS}.

Now let’s do this with two elements in the dict.

>>> dis.dis(lambda: {'a': 1, 'b': 2})
  1           0 LOAD_CONST               1 (1)
              2 LOAD_CONST               2 (2)
              4 LOAD_CONST               3 (('a', 'b'))
              6 BUILD_CONST_KEY_MAP      2
              8 RETURN_VALUE

Here, we see a different opcode, BUILD_CONST_KEY_MAP which also takes the length of the dict as an argument. This is also explained best from the docs,

The version of BUILD_MAP specialized for constant keys. count values are consumed from the stack. The top element on the stack contains a tuple of keys.

Conclusion¶

Dictionaries in Python (or any other language for that matter) are a very powerful multi-purpose data structure and are extremely handy and easy to use in Python. I hoped to put the things I learned about them in this article. If you see any inaccuracies or if there’s something that makes for a good addition to this article, let me know in the comments below.

Thank you for reading. Please let me know what you think. If you have any topics you’d like me to cover in a future article, put in a comment.

References¶

The official documentation, mostly. Wikipedia for data used in examples.

I read the proof for this a long time ago, but I don’t remember where :). ↩

Migrate from Pelican to Hugo

2017-08-23T00:00:00+05:30

Update: I have now moved to using a self-made Python program that compiles my markdown article documents into the website you see. I’m keeping this article as a journal of my then experience.

I recently got around to resurrecting my blog up after around five years of death. As part of that, I chose to migrate my blog to Hugo, from the current Pelican builder. The first post after resurrection will be about the migration.

If you’re wondering why the long break, well, I could blame it on life and work, but it was just me being lazy. Hopefully, that won’t happen again.

Why Hugo¶

When I decided to start writing again, I couldn’t remember who I was building the site. That’s probably entirely my fault for not documenting it for myself, but I ended up being almost new to Pelican. So, instead of directly going to Pelican’s homepage, I checked out StaticGen to see the current landscape of static site generators. The most popular (measure by GitHub stars) is obvious, Jekyll. Then came Hugo, a name I didn’t recognize. Other than Pelican, all the ones in the top-ten are built on Ruby or JavaScript (node.js). I wasn’t keen on either. Hugo was in a unique position since it is written in a compiled language, so multiplatform binaries are relatively easy to come by.

I read the documentation on a weekend and I was impressed. Hugo it is. The thing that struck me most in Hugo is that it does it’s primary thing only. Generating HTML files from Markdown files. It doesn’t force a blog-like website or a documentation-like website. That’s up to you. Hugo is like a bridge between your markdown files and the output HTML files. The structure of the output is a mirror image of your source files and the config.toml file (or config.yaml).

Migration¶

A new site¶

Issued the command hugo new site sharats.me.

Configuration¶

Hugo’s default configuration is of the TOML format. I read the README and wasn’t convinced. Thankfully, Hugo supports configuration in YAML.

So, this is what I came up with in my config.yaml file.

baseURL: http://sharats.me/
languageCode: en-us
title: "The Sharat's"

The current config.yaml is much longer and can be viewed on the github repo of this site.

Change metadata format¶

The article metadata in my Pelican site looks like the following:

Title: Serializing python-requests' Session objects for fun and profit
Date: 18.2.2012
Tags: python, python-requests, python-pickle
Reddit: true

There’s a lot of things in this that I wouldn’t do if I wrote that article today, but meh.

Hugo calls these frontmatter and I needed it to look like the following to make it happy.

---
title: Serializing python-requests' Session objects for fun and profit
date: 2012-02-18
tags: 'python', 'python-requests', 'python-pickle'
reddit: true
---

The following awk script did the trick:

BEGIN { FS = ":"; OFS = ":"; print "---" }

!c && /^$/ { print "---\n"; c = 1 }

c { print; next }

!c {
    $1 = tolower($1)

    if ($1 == "date") {
        $2 = gensub(/ ([^.]+)\.([^.]+).([^.]+)/, " \\3-\\2-\\1", 1, $2)
        $2 = gensub(/-([0-9])-/, "-0\\1-", 1, $2)
    }

    if ($1 == "tags")
        $2 = " [" gensub(/[-a-z]+/, "'\\0'", "g", substr($2, 2)) "]"

    print
}

Change code blocks¶

All my code blocks were of the following format:

    :::python
    import this

But, I needed them like this:

```python
import this
```

So, the following little python script did the trick:

Show remaining 17 lines

#!/usr/bin/env python3

import sys


def process(f):
    cb = False
    empties = 0
    output = []
    for line in f:
        line = line.rstrip('\n')

        if not line:
            empties += 1
            continue

        prefix = ''
        if line.startswith('    '):
            line = line[4:]
            if not cb:
                cb = True
                line = line.replace(':::', '```', 1) if line.startswith(':::') else ('```\n' + line)

        elif cb:
            cb = False
            prefix = '```\n'

        output.append(prefix + '\n' * empties + line)
        empties = 0

    return '\n'.join(output)


for file_name in sys.argv[1:]:
    with open(file_name) as f:
        output = process(f)
    print(output)

Yeah, didn’t have the patience to do it with awk this time.

The Theme¶

I tried the themes over at the Hugo themes page, but just as I thought, none of them were to my liking. I found the nofancy theme to be easy to get started and modify to what I want, so that’s what happened. Hugo’s documentation is very good. I have to say, the documentation is one of the reasons I’m loving Hugo.

Hope to be writing more articles in the coming weeks.

The ever useful and neat subprocess module

2012-04-29T00:00:00+05:30

Python’s subprocess module is one of my favourite modules in the standard library. If you have ever done some decent amount of coding in python, you might have encountered it. This module is used for dealing with external commands, intended to be a replacement to the old os.system and the like.

The most trivial use might be to get the output of a small shell command like ls or ps. Not that this is the best way to get a list of files in a directory (think os.listdir), but you get the point.

I am going to put my notes and experiences about this module here. Please note, I wrote this with Python 2.7 in mind. Things are slightly different in other versions (even 2.6). If you find any errors or suggestions, please let me know.

Table of Contents

A simple usage
Popen class
Running via the shell
Getting the return code (aka exit status)
IO Streams
- Reading error stream
- Watching both stdout and stderr
Passing an environment
- Merge with current environment
- Unicode
Execute in a different working directory
Killing and dying
- Auto-kill on death
Launch commands in a terminal emulator
- Linux
- Windows
Conclusion

A simple usage¶

For the sake of providing context, lets run the ls command from subprocess and get its output

import subprocess
ls_output = subprocess.check_output(['ls'])

I’ll cover getting output from a command in detail later. To give more command line arguments,

subprocess.check_output(['ls', '-l'])

The first item in the list is the executable and rest are its command line arguments (argv equivalent). No quirky shell quoting and complex nested quote rules to digest. Just a plain python list.

However, not having shell quoting implies you don’t also have the shell niceties. Like piping for one. The following won’t work the way one would expect it to.

subprocess.check_output(['ls', '|', 'wc', '-l'])

Here, the ls command gets its first command as | and I have no idea what ls would do with it. Perhaps complain that no such file exists. So, instead, we have to use the shell boolean argument. More later down in the article.

Popen class¶

If there’s just one thing in the subprocess module that you should be concerned with, its the Popen class. The other functions like call, check_output, and check_call use Popen internally. Here’s the signature from the docs.

class subprocess.Popen(args, bufsize=0, executable=None, stdin=None,
    stdout=None, stderr=None, preexec_fn=None, close_fds=False, shell=False,
    cwd=None, env=None, universal_newlines=False, startupinfo=None,
    creationflags=0)

I suggest you read the docs for this class. As with all python docs, its really good.

Running via the shell¶

Subprocess can also run command-line instructions via a shell program. This is usually dash/bash on Linux and cmd on windows.

subprocess.call('ls | wc -l', shell=True)

Notice that in this case we pass a string, not a list. This is because we want the shell to interpret the whole of our command. You can even use shell style quoting if you like. It is up to the shell to decide how to best split the command line into executable and command line arguments.

On windows, if you pass a list for args, it will be turned into a string using the same rules as the MS C runtime. See the doc-string for subprocess.list2cmdline for more on this. Whereas on unix-like systems, even if you pass a string, its turned into a list of one item :).

The behaviour of the shell argument can sometimes be confusing so I’ll try to clear it a bit here. Something I wished I had when I first encountered this module.

Firstly, lets consider the case where shell is set to False, the default. In this case, if args is a string, it is assumed to be the name of the executable file. Even if it contains spaces. Consider the following.

subprocess.call('ls -l')

This won’t work because subprocess is looking for an executable file called ls -l, but obviously can’t find it. However, if args is a list, then the first item in this list is considered as the executable and the rest of the items in the list are passed as command line arguments to the program.

subprocess.call(['ls', '-l'])

does what you think it will.

Second case, with shell set to True, the program that actually gets executed is the OS default shell, /bin/sh on Linux and cmd.exe on windows. This can be changed with the executable argument.

When using the shell, args is usually a string, something that will be parsed by the shell program. The args string is passed as a command line argument to the shell (with a -c option on Linux) such that the shell will interpret it as a shell command sequence and process it accordingly. This means you can use all the shell builtins and goodies that your shell offers.

subprocess.call('ls -l', shell=True)

is similar to

$ /bin/sh -c 'ls -l'

In the same vein, if you pass a list as args with shell set to True, all items in the list are passed as command line arguments to the shell.

subprocess.call(['ls', '-l'], shell=True)

is similar to

$ /bin/sh -c ls -l

which is the same as

$ /bin/sh -c ls

since /bin/sh takes just the argument next to -c as the command line to execute.

Getting the return code (aka exit status)¶

If you want to run an external command and its return code is all you’re concerned with, the call and check_call functions are what you’re looking for. They both return the return code after running the command. The difference is, check_call raises a CalledProcessError if the return code is non-zero.

If you’ve read the docs for these functions, you’ll see that its not recommended to use stdout=PIPE or stderr=PIPE. And if you don’t, the stdout and stderr of the command are just redirected to the parent’s (Python VM in this case) streams.

If that is not what you want, you have to use the Popen class.

proc = Popen('ls')

The moment the Popen class is instantiated, the command starts running. You can wait for it and after its done, access the return code via the returncode attribute.

proc.wait()
print proc.returncode

If you are trying this out in a python REPL, you won’t see a need to call .wait() since you can just wait yourself in the REPL till the command is finished and then access the returncode. Surprise!

>>> proc = Popen('ls')
>>> file1 file2

>>> print proc.returncode
None
>>> # wat?

The command is definitely finished. Why don’t we have a return code?

>>> proc.wait()
0
>>> print proc.returncode
0

The reason for this is the returncode is not automatically set when a process ends. You have to call .wait or .poll to realize if the program is done and set the returncode attribute.

IO Streams¶

The simplest way to get the output of a command, as seen previously, is to use the check_output function.

output = subprocess.check_output('ls')

Notice the check_ prefix in the function name? Ring any bell? That’s right, this function will raise a CalledProcessError if the return code is non-zero.

This may not always be the best solution to get the output from a command. If you do get a CalledProcessError from this function call, unless you have the contents of stderr you probably have little idea what went wrong. You’ll want to know what’s written to the command’s stderr.

Reading error stream¶

There are two ways to get the error output. First is redirecting stderr to stdout and only being concerned with stdout. This can be done by setting the stderr argument to subprocess.STDOUT.

Second is to create a Popen object with stderr set to subprocess.PIPE (optionally along with stdout argument) and read from its stderr attribute which is a readable file-like object. There is also a convenience method on Popen class, called .communicate, which optionally takes a string to be sent to the process’s stdin and returns a tuple of (stdout_content, stderr_content).

Watching both `stdout` and `stderr`¶

However, all of these assume that the command runs for some time, prints out a couple of lines of output and exits, so you can get the output(s) in strings. This is sometimes not the case. If you want to run a network intensive command like an svn checkout, which prints each file as and when downloaded, you need something better.

The initial solution one can think of is this.

proc = Popen('svn co svn+ssh://myrepo', stdout=PIPE)
for line in proc.stdout:
    print line

This works, for the most part. But, again, if there is an error, you’ll want to read stderr too. It would be nice to read stdout and stderr simultaneously. Just like a shell seems to be doing. Alas, this remains a not so straightforward problem as of today, at least on non-Linux systems.

On Linux (and where its supported), you can use the select module to keep an eye on multiple file-like stream objects. But this isn’t available on windows. A more platform independent solution that I found works well, is using threads and a Queue.

Show remaining 15 lines

from subprocess import Popen, PIPE
from threading import Thread
from Queue import Queue, Empty

io_q = Queue()

def stream_watcher(identifier, stream):

    for line in stream:
        io_q.put((identifier, line))

    if not stream.closed:
        stream.close()

proc = Popen('svn co svn+ssh://myrepo', stdout=PIPE, stderr=PIPE)

Thread(target=stream_watcher, name='stdout-watcher',
        args=('STDOUT', proc.stdout)).start()
Thread(target=stream_watcher, name='stderr-watcher',
        args=('STDERR', proc.stderr)).start()

def printer():
    while True:
        try:
            # Block for 1 second.
            item = io_q.get(True, 1)
        except Empty:
            # No output in either streams for a second. Are we done?
            if proc.poll() is not None:
                break
        else:
            identifier, line = item
            print identifier + ':', line

Thread(target=printer, name='printer').start()

Fair bit of code. This is a typical producer-consumer thing. Two threads producing lines of output (one each from stdout and stderr) and pushing them into a queue. One thread watching the queue and printing the lines until the process itself finishes.

Passing an environment¶

The env argument to Popen (and others) lets you customize the environment of the command being run. If it is not set, or is set to None, the current process’s environment is used, just as documented.

You might not agree with me, but I feel there are some subtleties with this argument that should have been mentioned in the documentation.

Merge with current environment¶

One is that if you provide a mapping to env, whatever is in this mapping is all that’s available to the command being run. For example, if you don’t give a TOP_ARG in the env mapping, the command won’t see a TOP_ARG in its environment. So, I frequently find myself doing this

p = Popen('command', env=dict(os.environ, my_env_prop='value'))

This makes sense once you realize it, but I wish it were at least hinted at in the documentation.

Unicode¶

Another one, is to do with Unicode (Surprise surprise!). And windows. If you use unicodes in the env mapping, you get an error saying you can only use strings in the environment mapping. The worst part about this error is that it only seems to happen on windows and not on Linux. If its an error to use unicodes in this place, I wish it break on both platforms.

This issue is very painful if you’re like me and use unicode all the time.

from __future__ import unicode_literals

That line is present in all my python source files. The error message doesn’t even bother to mention that you have unicodes in your env so it’s very hard to understand what’s going wrong.

Execute in a different working directory¶

This is handled by the cwd argument. You set the location of the directory which you want as the working directory of the program you are launching.

The docs do mention that the working directory is changed before the command even starts running. But that you can’t specify program’s path relative to the cwd. In reality, I found that you can do this.

Either I’m missing something with this or the docs really are inaccurate. Anyway, this works

subprocess.call('./ls', cwd='/bin')

Prints out all the files in /bin. Of course, the following doesn’t work when the working directory is not /bin.

subprocess.call('./ls')

So, if you are giving something explicitly to cwd and are using a relative path for the executable, this is something to keep in mind.

Killing and dying¶

A simple

proc.terminate()

Or for some dramatic umphh!

proc.kill()

Will do the trick to end the process. As noted in the documentation, the former sends a SIGTERM and later sends a SIGKILL on unix, but both do some native windows-y thing on windows.

Auto-kill on death¶

The processes you start in your python program, stay running even after your program exits. This is usually what you want, but when you want all your sub processes killed automatically on exit with Ctrl+C or the like, you have to use the atexit module.

procs = []

@atexit.register
def kill_subprocesses():
    for proc in procs:
        proc.kill()

And add all the Popen objects created to the procs list. This is the only solution I found that works best.

Launch commands in a terminal emulator¶

On one occasion, I had to write a script that would launch multiple svn checkouts and then run many ant builds (~20-35) on the checked out projects. In my opinion, the best and easiest way to do this is to fire up multiple terminal emulator windows each running an individual checkout/ant-build. This allows us to monitor each process and even cancel any of them by simply closing the corresponding terminal emulator window.

Linux¶

This is pretty trivial actually. On Linux, you can use xterm for this.

Popen(['xterm', '-e', 'sleep 3s'])

Windows¶

On windows, its not as straight forward. The first solution for this would be

Popen(['cmd', '/K', 'command'])

/K option tells cmd to run the command and keep the command window from closing. You may use /C instead to close the command window after the command finishes.

As simple as it looks, it has some weird behavior. I don’t completely understand it, but I’ll try to explain what I have. When you try to run a python script with the above Popen call, in a command window like this

python main.py

you don’t see a new command window pop up. Instead, the sub command runs in the same command window. I have no idea what happens when you run multiple sub commands this way. (I have only limited access to windows).

If instead you run it in something like an IDE or IDLE (F5), you have a new command window open up. I believe one each for each command you run this way. Just the way you expect.

But I gave up on cmd.exe for this purpose and learnt to use the mintty utility that comes with cygwin (I think 1.7+). mintty is awesome. Really. Its been a while since I felt that way about a command line utility on windows.

Popen(['mintty', '--hold', 'error', '--exec', 'command'])

This. A new mintty console window opens up running the command and it closes automatically, if the command exits with zero status (that’s what --hold error does). Otherwise, it stays on. Very useful.

Conclusion¶

The subprocess module is a very useful thing. Spend some time understanding it better. This is my attempt at helping people with it, and turned out to be way longer than I’d expected. If there are any inaccuracies in this, or if you have anything to add, please leave a comment.

Serializing python-requests' Session objects for fun and profit

2012-02-18T00:00:00+05:30

Prepare¶

If you haven’t checked out @kennethreitz’s excellent python-requests library yet, I suggest you go do that immediately. Go on, I’ll wait for you.

Had your candy? That is one of the most beatiful piece of python code I’ve read. And its an excellent library with a very humane API.

Recently, I have been using this library for a few of my company’s internal projects and at a point I needed to serialize and save Session objects for later. That wasn’t as straightforward as I first thought it’d be, so I am sharing my experience here.

First off, let’s make a simple http server which we are going to contact with python-requests. The server should be able to handle cookie based sessions and also have basic auth, as these things are handled by python-requests’ Session objects on the client side. I won’t discuss the code for the server here, you can get it from the gist.

Once you have the server running, now for the client, lets do requests!

import requests as req

URL_ROOT = 'http://localhost:5050'

def get_logged_in_session(name):
    session = req.session(auth=('user', 'pass'))

    login_response = session.post(URL_ROOT + '/login', data={'name': name})
    login_response.raise_for_status()

    return session

def get_whoami(session):
    response = session.get(URL_ROOT + '/whoami')
    response.raise_for_status()
    return response.text

I defined two functions here. The get_logged_in_session will create a new session and login to the http server and return that session. Any subsequent requests using this sesssion will be made as if you have logged in. That’s what will be tested with the get_whoami function, which will just return the response from /whoami.

Lets test this out. Make sure the server.py is running and in another terminal,

$ python -i client.py
>>> s = get_logged_in_session('sharat')
>>> get_whoami(s)
u'You are sharat'
>>> get_whoami(req.session(auth=('user', 'pass')))
u'You are a guest'

Works perfectly. If we pass it the logged in session, it gives us the username and if we pass it a new session, it gives us a guest.

Now, lets assume we have two functions, serialize_session and deserialize_session which do exactly what their names say. We can test them out by running a small test.py, as

from client import get_logged_in_session, get_whoami
from serializer import deserialize_session, serialize_session

session = get_logged_in_session('sharat')
dsession = deserialize_session(serialize_session(session))

assert get_whoami(session) == get_whoami(dsession)
print 'Success'

and a dummy serializer.py

def serialize_session(session):
    return session

def deserialize_session(session):
    return session

And with that, of course, the test will not fail

$ python test.py
Success

Serializing¶

Now, to implement the functions in serializer.py. A simple one, would be to use pickle. Lets try

import pickle as pk

def serialize_session(session):
    return pk.dumps(session)

def deserialize_session(data):
    return pk.loads(data)

If you run test.py now, python is going to yell at you.

$ python test.py
Traceback (most recent call last):
    File "test.py", line 10, in <module>
    dsession = deserialize_session(serialize_session(session))
[ ... ]
    raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle lock objects

Oh well, it was worth a try I suppose.

Update: The Session class can be made to implement the pickle protocol if you want to use pickle.

Next plan I had was to pick up attributes and data from a Session object, just enough to recreate this object using the Session constructor, and serialize those attributes as a JSON. After all, the Session’s API is very easy to use, how hard can picking attributes from it be? :)

So, I dug in the sessions.py module of python-requests library. And here’s what the signature of the constructor for Session objects looks like

def __init__(self,
    headers=None,
    cookies=None,
    auth=None,
    timeout=None,
    proxies=None,
    hooks=None,
    params=None,
    config=None,
    verify=True):
    # ...

So, if I pick up just these values, I should be able to recreate the session object. Sweet.

import json
import requests as req

def serialize_session(session):
    attrs = ['headers', 'cookies', 'auth', 'timeout', 'proxies', 'hooks',
        'params', 'config', 'verify']

    session_data = {}

    for attr in attrs:
        session_data[attr] = getattr(session, attr)

    return json.dumps(session_data)

def deserialize_session(data):
    return req.session(**json.loads(data))

And let’s try this out

$ python test.py
Traceback (most recent call last):
    File "test.py", line 12, in <module>
    assert get_whoami(session) == get_whoami(dsession)
[ ... ]
[...]requests/models.py", line 447, in send
    r = self.auth(self)
TypeError: 'list' object is not callable

Okay, that error message is very weird. Why would anyone call a list object?

Go dig in the models.py module. See this

[ ... ]
if isinstance(self.auth, tuple) and len(self.auth) == 2:
    # special-case basic HTTP auth
    self.auth = HTTPBasicAuth(*self.auth)

# Allow auth to make its changes.
r = self.auth(self)
[ ... ]

There. Its not a list that’s being called. Not directly at least. The problem here is that the auth we are passing to session() is not a tuple. Duh! While I like it that auth is restricted to be a tuple, I wish there was a better error message for when auth is a list instead of a tuple. I personally wouldn’t want it to accept a list for auth though.

So, what went wrong? json does not differentiate between a tuple and a list. It only does lists. So, when serializing and deserializing, the auth tuple is turned to a list. Lets turn it back

def deserialize_session(data):
    session_data = json.loads(data)

    if 'auth' in session_data:
        session_data['auth'] = tuple(session_data['auth'])

    return req.session(**session_data)

And

$ python test.py
Traceback (most recent call last):
  File "test.py", line 12, in <module>
    assert get_whoami(session) == get_whoami(dsession)
  [ ... ]
  File "/usr/lib/python2.7/string.py", line 493, in translate
    return s.translate(table, deletions)
TypeError: translate() takes exactly one argument (2 given)

Wait. What? Now we have an error from stdlib? This just keeps getting better and better. If this looks like something that can frustrate you, go get some coffee :)

If you look at the complete stack trace, the second file from bottom,

  File "[...]site-packages/requests/packages/oreos/monkeys.py", line 470, in set
    if "" != translate(key, idmap, LegalChars):

This thing seems to be calling the translate method incorrectly. With a bit of debugging and yelling at my monitor, I found out the problem and for a moment, lost my grip on reality.

str.translate takes 2 arguments, but unicode.translate takes only 1. I have no idea why this is done this way but I sure as hell didn’t enjoy it. The code in oreos/monkeys.py assumes that the key is a str. However, what json.loads gives you, is unicode stuff. So, we need to convert just the parts in the deserialized dict we get from json.loads which are being used by the oreos/monkeys.py, from unicode to str.

Reading a bit more code around the oreos library, it didn’t take long to figure out that those were the keys in the cookies dict. Lo

def deserialize_session(data):
    session_data = json.loads(data)

    if 'auth' in session_data:
        session_data['auth'] = tuple(session_data['auth'])

    if 'cookies' in session_data:
        session_data['cookies'] = dict((key.encode(), val) for key, val in
                session_data['cookies'].items())

    return req.session(**session_data)

And so

$ python test.py
Success

All the code is on a gist.

Update: Pickling can also work¶

As Daslch pointed out in his comment on reddit, by implementing the pickle protocol on the Session class, we can get pickling to work. From the documentation, we need two methods, __getstate__ and __setstate__.

Adding those methods as follows to sessions.Session class

def __getstate__(self):
    attrs = ['headers', 'cookies', 'auth', 'timeout', 'proxies', 'hooks',
        'params', 'config', 'verify']
    return dict((attr, getattr(self, attr)) for attr in attrs)

def __setstate__(self, state):
    for name, value in state.items():
        setattr(self, name, value)

    self.poolmanager = PoolManager(
        num_pools=self.config.get('pool_connections'),
        maxsize=self.config.get('pool_maxsize')
    )

with this as the version of serializer.py that uses pickle, we do get a Success.

The creation of new poolmanager in __setstate__ is a piece of code copied from __init__ of the same class. This should probably be turned to a method to avoid code repetition.

Update 2: Created an issue about this.

Update 3: This has been merged and Session objects are pickleable as of version 0.10.3. See requests history.

Dependency graph of all installed gems

2011-09-30T00:00:00+05:30

Every other application written using ruby these days seem to come with this installation instruction:

gem install my-super-awesome-app

and then going on to describing how awesome the app is. But, installing the app in the above way installs all its bazillion dependencies, which, unfortunately are not uninstalled when you uninstall this app with

gem uninstall the-same-damn-app

And so, you have huge mess of gems installed which you have no idea why they are there in the first place. Finding out stale gems that are left out because of this can be a pain.

So, I decided a neat flowchart visualising the dependency relationships between all the installed jars would give me a picture. And yes, it did.

Here’s how I got the flowchart: (save this in say, gem-graph.sh)

#!/bin/bash

gem list \
    | cut -d\  -f1 \
    | xargs gem dep \
    | awk '\
        BEGIN { print "digraph gems {" } \
        /^Gem / { cur=$2; sub(/-[0-9\.]+$/, "", cur); print "  \"" cur "\";" } \
        ! /^Gem / && $0 != "" { print "  \"" cur "\" -> \"" $1 "\";" } \
        END { print "}" }' \
    | dot -Tpng -o gems.png

Assuming you have GraphViz installed, you can just do

chmod +x gem-graph.sh
./gem-graph.sh

and the graph will be saved in gems.png. Happy gem cleaning :).

Implementing an expressive search system with clojure

2011-09-28T00:00:00+05:30

Backstory¶

I have recently learned Clojure and its the first time I’ve been exposed to lisp and the code-as-data way of life. I was eager to use Clojure to make an app, any app, a simple silly personal tool to help me out with a tedious task.

One such tool I created was classypants. Its a small swing based GUI tool that helps one to make sense out of the values of PATH like variables. The values of these variables are a list of paths of files/directories joined with : in *nix systems and ; on windows. Have you ever seen CLASSPATH entries that have ~100 jars/directories in it? Even if these values have just 20 items, its very hard to make any sense out of it.

Classypants is basically a pretty bare window carrying only 4 top level controls, one of which is an input box for searching through the entries. That search is what I want to talk about in this post.

Superpowers¶

Initially, the search box was just a filter box. I type some text and the entries that contain that text and shown, rest hidden. This quickly became annoying as I wanted to search for entries with jaxb and jar, which was not possible with the then implementation.

The implementation of the search I have today can do much more than even that. Its a powerful query language at work, using which we can filter entries that point to non-existing files, entries that point to directories that contain a said file and other weirdos.

How is it done?¶

I want to share how I went about evolving the search functionality. Let’s talk about one function here,

(defn matches?
  [search-str entry]
  (-> resource
    (.indexOf search-str)
    (not= -1)))

This is the first incarnation of the search implementation. It just checks if the given search-str is present inside the entry.

That is nice and useful. But we want more power. We want a nice minimal query language to describe what we want to find, and it should be easy to remember. Lets work on negation of search results first, thinking up the simplest of syntaxes,

not resource

should match entries that do not contain resource. This doesn’t look good, as it might also mean to search for entries that contain not or resource. We need some sugar to identify the not part as a directive that modifies how the search is done. Lets try again,

:not resource

Ah, the : in from of not gives it the special behaviour we need. Don’t worry too much about why the syntax isn’t not: resource or something else, it will become clear in a moment, if it hasn’t already. Now that we have a search syntax, its time to get it work. Imagine a function, digest, which takes a search string and returns a function, which takes an entry and tells if its a match or not. I suck at writing, read that again.

Essentially, (digest ":not resource") should return a function, which more or less works like

(fn [entry]
  (not (matches? "resource" entry)))

We see if there is a match, and not its result. Lets try writing the digest function,

(defn digest
  [search-str]
  (read-string (str "(" search-str ")")))

What we do above is wrap the search-str in parenthesis and read it into a Clojure list. Lets try out our function in the REPL.

user=> (digest ":not resource")
(:not resource)

Yep, just what we expected. Now, lets take this further ahead

(defn digest
  [search-str]
  (let [spec (read-string (str "(" search-str ")"))]
    (cond
      (= (first spec) :not) (fn [e]
                              (not (matches? (nth spec 1) e))))))

Vim undo breaks with auto-close plugins

2011-09-28T00:00:00+05:30

Prelude¶

If you’ve used IDEs or other heavy editors ever in your life, you’d know how nice it is to have parentheses and brackets to get auto-closed. If you don’t know what I’m talking about, its a feature usually present in IDEs like eclipse and easily recreated in vim with mappings like

inoremap ( ()<Left>

Of course, that’s just a simple taste. There are vastly complicated plugins that achieve this.

Now, what’s really super annoying about these plugins is that they tend to break vim’s amazingly powerful undo functionality. In other words, if you are using an auto-close plugin, chances are, you can’t rely on vim’s undo anymore.

Debugging this and finding the cause has been on my todo list for quite some time and a few days ago, I finally sat down to explore. I am writing my experience here. First, a simple test case to see if the auto-close plugin you use breaks undo, open vim (a blank file) and hit the following keys:

iabc{<CR><ESC>u

Where instead of <CR> you’d hit the return key and instead of <ESC> you’d hit the Escape key. Decent knowledge of vim should tell you that after the above keys, you should end up with a blank file again. Right?

If instead, you see a closing brace dangling in the second line, your undo is broken. MUHAHAHAHAHA! You can’t rely on undo anymore until you get rid of that one plugin!

What’s going on?¶

So, experimenting with many auto-close plugins and reading the source of at least 3 of those, I say there are basically two different implementations of this functionality, which all these plugins use. The first one is pretty much what was shown at the start of this article,

inoremap ( ()<Left>
" or
inoremap ( <C-r>="()\<Left>"

I’m going to call this class of plugins, the critters. These do not break your undo. The next class of implementations, that do break your undo, the beasts, do a bit of dark sorcery with stuff like

inoremap ( <C-r>=MyAwesomePairInseter()<CR>

There is no dark sorcery here that is immediately apparent. The real sorcery is inside that function, where a call to setline() function is made to replace your current line to contain the parentheses text at the cursor. Doesn’t make sense? Don’t worry, you’ll get it soon enough.

Which plugins? Name them!¶

Here are a few ones that break undo:

Beasts¶

and these don’t break undo

Critters¶

An initial look at them and you can tell, the ones that break undo are actually more popular and have a relatively larger code base. So why doesn’t anyone complain about breaking undo? I think they do and I believe the root cause is a bug with vim itself.

The main difference in usability among these classes is again to do with undo. In the beasts, typing a brace does not start a new undo action, but it does in the critters (like hitting a <C-g>u). This might actually be playing a role in why undo breaks in beasts only, but the exact reason escapes me.

A reproducible test case¶

I wanted to reproduce this problem with a vanilla vim with no custom configuration (except for nocompatible). So, I checked out the latest version (vim73-353) from the mercurial repository, compiled (with python, ruby and usual shit) and opened it, with no plugins and a simple vimrc as the following:

Show remaining 6 lines

set nocompatible

inoremap <buffer> <silent> ( <C-R>=<SID>InsertPair("(", ")")<CR>
inoremap <buffer> <silent> [ <C-R>=<SID>InsertPair("[", "]")<CR>
inoremap <buffer> <silent> { <C-R>=<SID>InsertPair("{", "}")<CR>

function! s:InsertPair(opener, closer)
    let l:save_ve = &ve
    set ve=all

    call s:InsertStringAtCursor(a:closer)

    exec "set ve=" . l:save_ve
    return a:opener
endfunction

function! s:InsertStringAtCursor(str)
    let l:line = getline('.')
    let l:column = col('.')-2

    if l:column < 0
        call setline('.', a:str . l:line)
    else
        call setline('.', l:line[:l:column] . a:str . l:line[l:column+1:])
    endif
endfunction

Which is a stripped down version of the auto-close functionality implemented in townk’s auto-close plugin. And opened vim

vim -u undo-breaker-vimrc

and did the test here. Boom, a dangling brace character.

For all I know, its the call to setline() that’s making all the difference. But I could be entirely wrong with that. I say this because that is the major difference between the two classes of implementations.

Next?¶

I use persistent-undo in vim73 and heavily depend on it. Combined with the gundo plugin by Steve Losh, I get a kind of nicely visualized version history that is centric to every file, which is quite handy in its own right.

So, if there are others who have faced this, have a fix for it, perhaps a patch to vim, or if there is already a bug in vim’s bug database on this, let me know.

Thank you for reading.

Installing Crunchbang Linux on my old lappy

2011-02-25T00:00:00+05:30

I managed to install Crunchbang linux, the recently released Stetler, after reading quite a positive review (I don’t remember where). I am really liking it, especially the Openbox desktop environment. Also, coming from a lot of experience on ubuntu, finding Crunchbang look so bare-bones and simple, yet so customizable is very refreshing. I will put my experience with installing it and my initial thoughts, before I forget them :).

Now my laptop’s got a defective and unreliable disk drive, so I chose to install Crunchbang from USB with the help of unetbootin. After downloading the #! (Crunchbang) ISO file, I fired up unetbootin on my windows vista (on the same laptop) and setup my 1GB pen drive to be bootable. After that, I had to create a couple of symlinks (using Cygwin) on the USB drive as following

ln -s live/vmlinuz1 vmlinuz
ln -s live/initrd1.img initrd.img

After that, the boot was pretty smooth, and I had to choose the graphical installer as the text based installer wouldn’t load, which I have no idea why.

Another interesting thing that happened was that at the end of the installation, #! asked me if I wanted to install the grub boot loader, and that it detects windows as another OS on the machine. However the grub it installed does not list windows in the boot menu. I asked a question about this on unix.stackexchange.com and got to know that a simple sudo update-grub added the windows item to my boot menu. Not a major set-back, but still.

After that, using the OS is nothing but a pure pleasure. It feels amazingly snappy and super productive. The conky based hotkey reference on the desktop is a killer thing to look for. Oh, and Dropbox installation is easier than on my ubuntu box, if you use Dropbox that is. Chrome, my browser of choice, is the default browser, what more can I ask? Awesome distribution. I am looking forward to exploring even more with my shiny new #!, and I seriously recommend you give it a try :)

A tasty vim configuration setup with Vimpire and Pathogen

2010-12-14T00:00:00+05:30

Managing vim plugins has always been a hassle. Until pathogen came along. If you are using vim with quite a few vim plugins, then you should be using pathogen, if you are not, you are seriously depriving yourself of sanity. No, seriously. You should.

So, I assume you are also versioning your .vim directory, like on GitHub or BitBucket with git or mercurial respectively. If you are not, then you should. You really really should.

If your answer was no to both of the above, you better get the hell out of here before I get my lawn mowers.

Okay, if you tried to version your .vim directory, but the plugin directories inside pathogen’s bundle directory are repositories themselves, you won’t be very happy. You either have to version all the .git and .hg and what not version directories from the plugins, or you just have to ignore them all and forgo versioning for individual plugins. But if you chose the latter, in which case versioning your .vim will be easy, updating your plugins is a serious pain.

So, recently, http://vim-scripts.org came up and so did scripts like vundle and vim-update-bundles, as listed on the tools page on http://vim-scripts.org. These let you list the plugins you use in your vimrc file and they take care of keeping them up to date. The advantage is that you can version your .vim directory, and wherever you clone it, you can just run the script used and all your plugins are set up, the latest versions of them, just like that. Awesome!

Vimpire isn’t much different from those tools. In fact, it is very similar to vim-update-bundles in functionality, but there are 2 main differences. First off, it is written in python. I won’t spell out the implications of that. But, it is ruby-less. Second, it supports hg. Yay! So, you can get plugins not just from git, but also from hg.

How to set it up and how to use it can be seen on the BitBucket page, via the README file.

Hosted at http://bitbucket.org/sharat87/vimpire/src

Please note that this is still beta. Tested on windows 7. I am waiting to get back to Ubuntu, but until then, no idea if it works on unix like machines.

Update: The latest version works perfectly with Ubuntu too!

The Sharat's

A Tale of Two Forwarded Headers

The Problem¶

Primary Behaviour¶

Cloud Run, the Reverse Proxy¶

The Reverse Proxy Inside Appsmith Container¶

The Solution¶

Conclusion¶

Running Docker containers in network isolation with proxied traffic

Docker Networks¶

Sandbox¶

Proxying HTTPS Requests¶

DNS Resolution¶

Connecting from Host¶

Testing Appsmith¶

Further Explorations¶

Conclusion¶

Shell Script Best Practices

Things¶

Template¶

Conclusion¶

Quick insecure TOTP

Hammerspoon¶

TOTP Script¶

Demo¶

Conclusion¶

Peeking into HTTPS Traffic with a Proxy

Setting up mitmproxy¶

Setting up¶

Setting proxy on the whole container¶

Conclusion¶

Bonus: Using Charles¶

Time is different every time

The Python `print` function

The Basics¶

Handling of Multiple Arguments¶

Handling of non-string types¶

Write to files¶

Using sys.stderr¶

Modifying sys.stdout¶

Collecting with io.StringIO¶

The end= keyword argument¶

A Note about Python 2¶

A Sad Imitation¶

The pprint Function¶

Conclusion¶

Dependency Injection In Python

The Problem¶

The Legacy Solution¶

The New DI Solution¶

Conclusion¶

The Magic of AutoHotkey — Part 2

File Explorer Magic¶

Focus Location Editor¶

Open Command Window¶

Folder Shortcuts¶

Better Hotkeys for Directional Navigation¶

Select Files by Pattern¶

Batch Rename¶

Copy Paths of Selected Files¶

Copy Contents of Selected Files¶

Create File with Clipboard Contents¶

Create Folder Hierarchy and Enter it¶

Email Selected File(s) with Outlook¶

Global Hotkey for New Mail¶

Conclusion¶

Automating the Vim workplace — Chapter Ⅲ

Copy file full path¶

Squeeze / Expand contiguous blank lines¶

Duplicate Text in Motion¶

Transpose¶

Using vartabstop to Line Up¶

Strip Trailing Spaces¶

Append character over motion¶

Conclusion¶

The Weird `global`

Simple Usage¶

Refer Directly¶

Modifying the Referred Object¶

Assigning without Declaring¶

Setting up `mitmproxy`¶

Using `sys.stderr`¶

Modifying `sys.stdout`¶

Collecting with `io.StringIO`¶

The `end=` keyword argument¶

The `pprint` Function¶

Using `vartabstop` to Line Up¶

Rewriting Comprehensions `map` & `filter` Builtins¶

The `key` Argument for `sorted`¶

Easier Alternative to `:`¶

Create a `.tar.bz2` Archive¶

Exclude `.git` Directory¶