2005 2006 2007 2008 2009 2010 2011 2015 2016 2017 2018 aspnet azure csharp debugging docker elasticsearch exceptions firefox javascriptajax linux llblgen mongodb powershell projects python security services silverlight training videos wcf wpf xag xhtmlcss

PowerShelling Azure DNS Management

DNS is one of those topics that every developer has to deal with. It's also one of those topics some of us developers get to deal with. I love DNS. It allows all manner of flexible addressing, failover, and load-balancing techniques. Understanding DNS takes you a long way in a whole lot of scenarios.

However, the difference between the fun architecture and possibilities of DNS and the implementation has often been the difference between a great sounding class syllabus and the horrible reality of a boatload of homework. Windows DNS, for example, was historically cryptic GUI gibberish. BIND was OK and has a very simple config format, but it scared most people off (you simply cannot win with people who hate GUIs and command lines).

Azure DNS lets you config via the GUI (noooooob; jk, the GUI is pretty nice), ARM templates, Azure CLI (=Linux), and PowerShell.

But, HOW do you manage the config? What format? How do you deploy?

One way is to do everything in ARM templates. Check out this example:


  "resources": [
    {
      "type": "Microsoft.Network/dnszones",
      "name": "example.org",
      "apiVersion": "2016-04-01",
      "location": "global",
      "properties": { }
    },
    {
      "type": "Microsoft.Network/dnszones/a",
      "name": "example.org/mysubdomain01",
      "apiVersion": "2016-04-01",
      "location": "global",
      "dependsOn": [
        "example.org"
      ],
      "properties": {
        "TTL": 3600,
        "ARecords": [{
            "ipv4Address": "127.0.0.1"
          }
        ]
      }
    }
  ],


There's nothing wrong with that. If you want to have a billion lines of JSON for the sake of a single DNS record, you have fun with that. Ugh. Fine. Yes, it will be idempotent, but are severe overkill for quick updates.

Because PowerShell is effectively the .NET REPL, you can write a simple PowerShell (=.NET) tool to handle any custom format you want.

The following is one way of formatting your DNS updates:


    @(
        @{
            Name="example1"
            Config=@(
            @{
                Type="CNAME"
                Value="hosting01gyfir-alpha.centralus.cloudapp.azure.com"
            })
        },
        @{
            Name= "example2"
            Config=@(
            @{
                Type="A"
                Value="127.0.0.1"
            })
        },
        @{
            Name= "mail"
            Config=@(
            @{
                Type="MX"
                Preference=10
                Exchange="mail.example.com"
            },
            @{
                Type="MX"
                Preference=20
                Exchange="mail2.example.org"
            })
        }
    )


I donno. I find that pretty simple. I'll use that when I setup something new.

Here's my function (with call) to make that work:


    function deployDns { param([Parameter(Mandatory=$true)]$rg, [Parameter(Mandatory=$true)]$zonename, [Parameter(Mandatory=$true)] $ttl, [Parameter(Mandatory=$true)]$records, [bool]$testOnly)
        ($records).foreach({
            $name = $_.Name
            Write-Host "Name: $name"
            $dnsrecords = @()
    
            ($_.Config).foreach({
                $config = $_
                $type = $config.Type
                switch($type) {
                    "CNAME" {
                        Write-Host ("`tCNAME: {0}" -f $config.Value)
                        $dnsrecords += New-AzureRmDnsRecordConfig -Cname $config.Value
                    }
                    "MX" {
                        Write-Host ("`tPreference: {0}" -f $config.Preference)
                        Write-Host ("`tExchange: {0}" -f $config.Exchange)
                        $dnsrecords += New-AzureRmDnsRecordConfig -Preference $config.Preference -Exchange $config.Exchange
                    }
                    "A" {
                        Write-Host ("`tPreference: {0}" -f $config)
                        Write-Host ("`tExchange: {0}" -f $config.Ipv4Address)
                        $dnsrecords += New-AzureRmDnsRecordConfig -Ipv4Address $config.Value
                    }
                    "AAAA" {
                        Write-Host ("`tIpv6Address: {0}" -f $config.Ipv6Address)
                        $dnsrecords += New-AzureRmDnsRecordConfig -Ipv6Address $config.Value
                    }
                    "NS" {
                        Write-Host ("`tNS: {0}" -f $config.Value)
                        $dnsrecords += New-AzureRmDnsRecordConfig -Nsdname $config.Value
                    }
                    "PTR" {
                        Write-Host ("`tPtrdname: {0}" -f $config.Value)
                        $dnsrecords += New-AzureRmDnsRecordConfig -Ptrdname $config.Value
                    }
                    "TXT" {
                        Write-Host ("`tPtrdname: {0}" -f $config.Value)
                        $dnsrecords += New-AzureRmDnsRecordConfig -Value $config.Value
                    }
                }
            })
            Write-Host
            Write-Host "Records:"
            Write-Host $dnsrecords
            if(!$testOnly) {
                New-AzureRmDnsRecordSet -ResourceGroupName $rg -ZoneName $zonename -RecordType $type -Name $name -Ttl $ttl -DnsRecords @dnsrecords
            }
            Write-Host
            Write-Host
        })
    }
    
    deployDns -testOnly $true -rg 'davidbetz01' -zonename "davidbetz.net" -ttl 3600 -records @(
        @{
            Name="example1"
            Config=@(
            @{
                Type="CNAME"
                Value="hosting01gyfir-alpha.centralus.cloudapp.azure.com"
            })
        },
        @{
            Name= "example2"
            Config=@(
            @{
                Type="A"
                Value="127.0.0.1"
            })
        },
        @{
            Name= "mail"
            Config=@(
            @{
                Type="MX"
                Preference=10
                Exchange="mail.example.com"
            },
            @{
                Type="MX"
                Preference=20
                Exchange="mail2.example.org"
            })
        }
    )


Is this insane? Probably. That's what I'm known for. ARM templates might be smarter given their idempotent nature, and I've found myself using the GUI now and again.

For now, just keep in mind that PowerShell lets you me ultra flexible with your Azure configuration, not just Azure DNS.

It's a NoSQL database

10 minutes after I did a webinar, explaining how we need to be specific about database storage and never use the unbelievably unhelpful and pointless term "NoSQL".

What I asked: What can you tell me about that type of database?

Response: It's a NoSQL database.



What I meant: What kind of structure does this database use? Relational? Hierarchical? Document? Graph? Key-value? Column-storage? Wide table? What kind of internal storage does this database use? B-tree? BW-tree? Memory mapping? Hash tables with linked lists? Is the data per collection/table or is that an abstraction for underlying partitions? Does the data format lend itself to easy horizontal partitioning (e.g. sharding)? If it's a sharded system, how does it deal with the scatter-gather problem and multi-node write confirmations? Is there a data router? Are replicas readable? Are there capabilities for edge nodes? How is data actually tracked (e.g. bitmaps, statistics)? Are records/documents mutable or immutable with delta structures? Are records deleted immediately or internally marked for deletion? What are this system's memory management strategy? What kind of aggregations are possible with this database? Are there silly 100MB aggregation limits or is it willing to use actually use the 128GB of RAM we just paid for specifically for the purpose of preventing our massive aggregates from needing to write to disk? Is there a transaction / an operations log? Is there support for transactions? How does this relate to rollbacks and commits? Are there various data isolation levels? Is locking pessimistic, optimistic, or just configurable? What can you tell me about available local partitioning strategies? Can you partition indexes to various local disks to optimize performance? What mechanisms allow us to optimize reads and writes separately? Is there native encryption at the database, table, field, or column level? Are there variable storage engines for this database? At what various levels does compression take place?

What I was told: You don't access it with SQL.





confused

Deploying Docker Static Applications

When throwing together a basic UI, lately I've been using React.

It's fun for smaller projects, but it's entirely useless for major projects. Given that the HTML is inside the JS (JSX), your artists/designers who write the HTML are pretty much sidelined for all HTML designs after the initial one. All subsequent changes are made by engineers who should never have a need to know the difference between aqua and cyan and should not ever care about box dimensions. That's why you hired an HTML artist. UX engineer is an oxymoron.

In a different part of the client-side world is Angular, which forces you to deal with TypeScript. While it's one of the only programming languages, with Go and to some extend Python, to get interfaces right, that one good thing isn't enough to make me ever want to go back to dealing with types. 16 years of C# is enough, thank you. Types lead to false negatives. You don't care that something is an integer, you care that it's between 2 and 12. Tests always outrank types.

Regardless of the poison you drink, you have to strip out something to make it work on the web. In the case of React, JSX must be removed. In the case of Angular, TypeScript must be removed. In both cases, the concept of components must be flattened. Thus, you always end up with a build process for client-side applications.

Raw ES5 + flux pattern is raw legit power. No frameworks. Check it out.

Furthermore, there's always more than mere files. You always have to think about how those files will get to the end user. Static files contain no inherent execution mechanism. Something must serve them. Of course, this is what a web server is for.

To summarize: to get your application deployed, you need need a way to build the application and you need a way to serve it. How did you get these files? Build process. How do you deliver these files? You need a web server.

There's a very simple single bullet for this solution: Docker.

Building

Examine the following single Dockerfile for building a React application:

#+ this staging area is thrown out, so no need to optimize too much
FROM node:8.11-alpine as staging

WORKDIR /var/app

RUN npm install -g create-react-app

#+ nginx.conf
RUN echo c2VydmVyIHsKICAgIGxpc3RlbiA4MDsKCiAgICBsb2NhdGlvbiAvIHsKICAgICAgICByb290IC92YXIvYXBwOwogICAgICAgIHRyeV9maWxlcyAkdXJpIC9pbmRleC5odG1sOwogICAgfQp9Cg== | base64 -d > /etc/nginx.conf

WORKDIR /var/app

COPY package.json /var/app

RUN npm install

COPY . /var/app

RUN npm run build

FROM nginx:1.13.9-alpine

COPY --from=staging /var/app/build /var/app/
COPY --from=staging /etc/nginx.conf /etc/nginx/conf.d/default.conf

STOPSIGNAL SIGTERM

ENTRYPOINT ["nginx", "-g", "daemon off;"]

There are two parts: staging and your application.

The staging area starts with a Node binary, setups up the React evironment by installing create-react-app (Facebook is horrible at naming things), then it does some magical voodoo (we'll come back to that), then it builds the application.

The second stage starts with an Nginx binary, copies over your application, a config file, then runs Nginx.

In the end, Docker will create a binary of your application that will run Nginx, which will serve your files.

That's literally everything you need.

You just build and run:

docker build . -t registry.gitlab.com/your_gitlab_name/example:prod-latest
docker run -p 80:80 registry.gitlab.com/your_gitlab_name/example:prod-latest

Your application is working and is production ready.

Configuration

About that magic voodoo...

When using Docker, you don't always need to mess with files. If you can avoid adding files to your application, you should do so. Because Docker lets you run pipes and redirect stdout, you can do much of this inline.

The staging area contained the following line:

RUN echo c2VydmVyIHsKICAgIGxpc3RlbiA4MDsKCiAgICBsb2NhdGlvbiAvIHsKICAgICAgICByb290IC92YXIvYXBwOwogICAgICAgIHRyeV9maWxlcyAkdXJpIC9pbmRleC5odG1sOwogICAgfQp9Cg== | base64 -d > /etc/nginx.conf

When you run the command without the redirect in a shell, you get the following:

[dbetz@ganymede ~]$ echo c2VydmVyIHsKICAgIGxpc3RlbiA4MDsKCiAgICBsb2NhdGlvbiAvIHsKICAgICAgICByb290IC92YXIvYXBwOwogICAgICAgIHRyeV9maWxlcyAkdXJpIC9pbmRleC5odG1sOwogICAgfQp9Cg== | base64 -d
server {
    listen 80;

    location / {
        root /var/app;
        try_files $uri /index.html;
    }
}

It's the nginx.conf file.

Now you can see why the second stage (FROM nginx:1.15-alpine), the one you're putting in production, is nginx. This is literally the web server that's serving up the production-ready files.

Run on your server and you're done.

Security

Nothing is complete without SSL security, but I don't recommend doing that with your Docker binaries.

Your binaries represent the application and only the application. SSL is an infrastructure add-on to your application.

Do your TLS on your host machine. This will give you more flexibility too since a single nginx surface will listen on all addresses at once and you can simply use server_name to match, thus enabling you to use a single IP address for an army of FQDNs.

An Advanced, Practical Introduction to Docker

This is an advanced, practical introduction to Docker. It's mainly for people who have used Docker, but want a deeper understanding. It's similar to, after finishing Calculus I, II, III, and DiffEq, taking Advanced Calculus to go back over limits, derivatives, and integration at a deeper level.

There's so such thing as an "operating system" in a Docker container. There's such thing as being "in" a Docker container. What you call a Docker "container" is just an abstraction.

Docker "containers" aren't like virtual machines; you're not creating a general purpose environment with its own kernel. What you're doing is creating a runnable, deliverable Docker binary that contains the minimum that's needed to run a single application. You're delivering an application, not an environment. When you hear "container", think "application". When you hear "image", think "binary".

Docker hides so many of the underlying mechanics that you get the impression you're dealing with lightweight virtual machines. We should be grateful for the fact that we're victims of Docker's success.

Namespaces

The basic idea behind Docker is that Linux already has the capabilities for creating isolation, they just needed to be harnessed in a user-friendly manner. Docker is largely a front-end that abstracts already existing Linux cleverness, including namespaces (isolation) and cgroups (resource utilization).

The session "What Have Namespaces Done for You Lately?" by Liz Rice helps to demonstrate this concept; she effectively builds her own Docker-like tool from the ground up using Go (which is what Docker is written in!)

When you're running what is colloquially known as a Docker "container", you're running a process just like any other process, but with a different namespace ID. This namespace concept is the same concept you already know from C# and C++: it separates entities so they don't conflict. Thus, in Linux, you can have process ID 1 is one namespace and process ID 1 is another. They aren't in different environments like virtual machines. They're isolated, but not entirely separate.

Namespaces also let you have /some/random/file in one namespace and a different /some/random/file in another namespace: think super-chrooting. You can even have something listening on port 80 in one namespace and something entirely different listening on port 80 in a different namespace. No conflicts.

There's just a lot of namespace magic to give the illusion of various "micro-machines". In reality, there are no "micro-machines"; everything is running in the same space, but with a simple label separating them.

The term "container" and the preposition "in" lead to extreme confusion. There's nothing "in" a container, but the terminology is pretty much baked into the industry at this point. Note also, you never run something "in" Docker, but you can run something "using" Docker.

One way to prove to yourself that there's no voodoo subsystem is to look at how ps works on your machine: you see the processes across each namespace. You may be running Elasticsearch and MongoDB as separate Docker "containers", but both of them will show up in the same ps output on your host machine.

See example below:

[dbetz@ganymede src]$ docker run -d mongo:3.7
fa55205ad518da6fe61a794732c325263c96d6a10a5692fa6ea9821c4bbcfc79

[dbetz@ganymede src]$ docker run -d docker.elastic.co/elasticsearch/elasticsearch:6.2.4
c7e337b67689831da6beb78231394ff2bcb9341e60689187f14d579650027d5e

[dbetz@ganymede src]$ ps aux | grep -E "mongo|elastic"
polkitd   43993  1.1  0.7 985104 56008 ?        Ssl  15:11   0:01 mongod --bind_ip_all
dbetz     45304 74.0 14.3 4040776 1145040 ?     Ssl  15:13   0:02 /usr/lib/jvm/jre-1.8.0-openjdk/bin/java -Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.io.tmpdir=/tmp/elasticsearch.uHJg0AmQ -XX:+HeapDumpOnOutOfMemoryError -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:logs/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=32 -XX:GCLogFileSize=64m -Des.cgroups.hierarchy.override=/ -Des.path.home=/usr/share/elasticsearch -Des.path.conf=/usr/share/elasticsearch/config -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch

A solid grasp of namespaces is critical to understanding Docker. Once you understand namespace concepts, you can move to understanding how namespaces can interact with each other. That's the larger world of Docker that extends deep into the design and deployment of orchestration.

To further review and reframe Docker concepts, let's recognize some of the resource types Docker uses. For the purpose of this discussion, let's use Azure's provider categories. This should keep the concepts general enough for reuse and specific enough to be practical.

The different resource types are:

  • compute (e.g. processes)
  • storage
  • networking

There are others as well, but they're usually very similar to the others (e.g. IPC is similar to networking).

When you spin something up using Docker (e.g. docker run), it will have everything in its own namespaces: the process, storage, and networking. You manage the mapping between namespaces yourself, per resource type.

Let's review with an example..

Run Elasticsearch ("ES"):

docker run -d docker.elastic.co/elasticsearch/elasticsearch:6.2.4

ES will run in it's own process (PID) namespace. It will listen on port 9200 in its own network namespace. It will store data at /usr/share/elasticsearch/data in its own mount namespace. It's entirely sandboxed.

To make ES practical, you need to map 9200 to something that can touch your network card and /usr/share/elasticsearch/data to something in a less ephemeral location.

Here's our new command:

docker run -d -p 9200:9200 -v /srv/elasticsearch6:/usr/share/elasticsearch/data docker.elastic.co/elasticsearch/elasticsearch:6.2.4

The point of reviewing these fundamental concepts is to further train your intuition in terms of namespaces. It's important have this intuitive training before going too deep into more Docker concepts like volumes or networks. Without training your intuitions to work in terms of namespaces, you'll inevitably end up with confused analogies with virtual machines, inefficient images, overly complex deployments, and unbelievably confused discussions.

On the other hand, with this understanding, it should be easy to understand that Docker represents applications, not operating systems. There are no kernels in Docker binaries or in "containers" just as there are no network drivers in your application's tarballs or zip files.

Namespaces are clever and very helpful. If you were to write a plug-in model for an application, you could instantiate each plug-in into a different namespace, then share an IPC namespace for communication. Supposedly, Google Chrome on Linux does something similar. Namespaces give you an easy, built-in way to do jailing/sandboxing.

Consider also this: because Docker spins-up processes just like any other process, each process has the same direct hardware access. Once you do a few device mappings to let the process in that namespace know where the real world hardware is, you're solid. So, you don't have to put too much thought into how to get something GPU access setup. Consider this is the context of this very confused SO discussion where people continue to cause confusion by talking about something "inside Docker containers". There is no "inside"; it's a process like any other.

Run man unshare on Linux to see the details for a native tool that creates namespaces.

Images

Docker "containers" are created from binaries called images. Docker images are merely Docker binaries of your application just like any other deliverable binary format (e.g. binary tarball).

These images are nothing more than file-system layouts with some metadata. The blueprint that provides instructions on how to build the file-system layout for an image is a Dockerfile.

This file-system will contain the application binary you want Docker to run. When your application runs, it may reach for various files (e.g. /lib64/libstdc++.so.6), these files just need to be where the application would expect them.

A Dockerfile also provides metadata that either describes the resulting binary. It also adds an instruction for how to start your application (e.g. CMD, ENDPOINT).

The most important concept reframe here is this: the resulting Docker image is your complete deliverable application binary. It does not represent a system, just a single application.

Take care to avoid large multi-level image inheritance for the sake of "standardization". Standardization is the exact opposite of what you want with Docker. Tailor the deliverable to your specific application's needs.

Image Starting Points

Your application will run like any other application on your system. As such, it will follow the same rules of dependencies as any other application: if you application needs a file in order to run, you need to make sure it's within grasp. A solid understanding that these file-systems exist in different namespaces instead of different subsystems, enables flexible ways of satisying dependency requirements.

For example, if your machine already has a fairly large file (e.g. /lib64/liblarge.so.7), instead of putting it in each image, keep it on the host and map it at run time (-v /lib64/liblarge.so.7:/lib64/liblarge.so.7). When Docker sees the running application ask for /lib64/liblarge.so.7, it will get it from the host machine. This concept, similar to symlinking, is at the heart of some important techniques discussed later.

When creating images, one option you have is to create a file-system from scratch. This entails adding each and every file to the proper location in the image. Much of what follows a bit later will pursue this method and explain how to effectively create such lightweight images.

Another option you have is to build your file-system on an existing file-system template. This is the traditional approach most applications use. It maximizes portability, but the resulting images are larger, containing a huge number of unused files.

When not careful, this second approach leads to atrocious misunderstandings.

Consider the following Dockerfile:

FROM ubuntu

RUN groupadd user01 \
  && useradd --gid user01 user01

RUN apt-get install sometools

CMD [ "sometool" ]

This file could very well lead many to think that there's an Ubuntu operating system "base image" that you're using and extending .This is entirely wrong.

As mentioned previously, Docker is primarily a front-end for existing Linux functionality. There is no concept of a hypervisor subsystem or the like. Applications run as they have always ran. There are no kernels in images, therefore there are no operating-systems in images. Docker does not work with operating-systems, it works with applications. There is no OS "base image". There is no place for sysadmins to do any work with Docker at all. Your CMD/ENDPOINT instruction does not start init nor systemd, it starts your application.

Ubuntu is not in your image, only a file-system that looks like an Ubuntu file-system is in your image. FROM ubuntu merely states that the Dockerfile will start with a file-system template that looks like Ubuntu. You use it when you don't care about the size of your image and really need your application to work in an Ubuntu-like file-system. If your host system is RHEL, your binaries still run on RHEL -- Docker does not deal with operating systems.

For the most part, using Linux OS file-system templates are a very poor practice. They are largely not optimized for Docker. However, there is one OS file-system template that is optimized for Docker: Docker Alpine.

Docker Alpine provides a very small OS file-system template that maximizes application portability while minimizing binary size.

The previous Dockerfile would transform to Docker Alpine like this:

FROM alpine

RUN addgroup -g 1000 user01 && \
adduser -D -u 1000 -G user01 user01

RUN apk add --no-cache sometools

CMD [ "sometool" ]

The resulting binary would be much smaller. Yet, keep in mind that Alpine is not in your Docker image. Docker does not put operating systems into images. Your image is merely built on a file-system template that looks like an Alpine Linux file-system.

Do not confuse Docker Alpine with Alpine Linux. The former is a file-system template that looks like an Alpine file-system, while the latter is an operating system for routers, tiny linux deployments, and Raspberry Pi.

When creating portable images without extensive binary optimization, Docker Alpine is the only viable option. Do not use FROM centos or FROM ubuntu in any environment. These lead to extremely large and cause severe confusion.

This bears repeating: the entire point of Docker is to run your application. To do this, you just need to make sure your application has what it needs to run. The question is not "What do I build my application on?", the question is "What specific files does my application require?" Your application most likely doesn't need 90% of the files that a Linux file-system template provides, it probably just needs a few libraries. It may not even need the full XYZ library, but just file Y.

Docker lets you optimize your application like this. If you can identify the dependencies of your application, you'll be to build your file-system FROM scratch.

scratch

At runtime, you're working with namespaces. At build time, you're working with images. Your ability to create optimal images is directly proportional to your understanding of namespaces and your application.

Let's jump right to an example of creating a tiny, usable Docker image...

First let's look at the hello.asm file we want to run (taken from http://cs.lmu.edu/~ray/notes/x86assembly/):

        global  _start

        section .text
_start:
        ; write(1, message, 13)
        mov     rax, 1                  ; system call 1 is write
        mov     rdi, 1                  ; file handle 1 is stdout
        mov     rsi, message            ; address of string to output
        mov     rdx, 13                 ; number of bytes
        syscall                         ; invoke operating system to do the write

        ; exit(0)
        mov     eax, 60                 ; system call 60 is exit
        xor     rdi, rdi                ; exit code 0
        syscall                         ; invoke operating system to exit
message:
        db      "Hello, World", 10      ; note the newline at the end

Our goal is to create a tiny, deliverable Docker binary that writes-out "Hello, World".

Here's how we'll do it:

FROM alpine as asm

WORKDIR /elephant

COPY hello.asm .

RUN apk add --no-cache binutils nasm && \
    nasm -f elf64 -p hello.o hello.asm && \
    ld -o hello hello.o

FROM scratch

COPY --from=asm /elephant/hello /

ENTRYPOINT ["./hello"]

This Dockerfile uses two stages: a build-stage and a run-stage. The first stage installs NASM, assembles the code, then links it the applcation, the second carefully places the application into the deliverable Docker binary. The second stage contains a single file: /elephant/hello. It does not contain NASM, the source code, nor any intemediate files.

You can use as many stages as you want: sometimes you'll need a CI-setup stage (setup tools), then a backend-build stage (setup node, run npm install), then a front-end build-stage (build Angular), then a final stage to carefully place files from previous stages (copy /node_modules/ and Angular/dist files to node application). Only the final stage is deployed, everything else is thrown out.

This results in the following:

[dbetz@ganymede tiny-image]$ docker build . -t local/tiny-image
Sending build context to Docker daemon  4.096kB
Step 1/7 : FROM alpine as asm
 ---> 3fd9065eaf02
Step 2/7 : WORKDIR /elephant
Removing intermediate container da8e9f72ebd2
 ---> 29896ad4bb3c
Step 3/7 : COPY hello.asm .
 ---> 9ccc8ab38794
Step 4/7 : RUN apk add --no-cache binutils nasm &&     nasm -f elf64 -p hello.o hello.asm &&     ld -o hello hello.o
 ---> Running in f99cbecc309d
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/community/x86_64/APKINDEX.tar.gz
(1/3) Installing binutils-libs (2.30-r1)
(2/3) Installing binutils (2.30-r1)
(3/3) Installing nasm (2.13.01-r0)
Executing busybox-1.27.2-r7.trigger
OK: 17 MiB in 14 packages
Removing intermediate container f99cbecc309d
 ---> d840e66bfbdb
Step 5/7 : FROM scratch
 --->
Step 6/7 : COPY --from=asm /elephant/hello /
 ---> fd85715eaf85
Step 7/7 : ENTRYPOINT ["./hello"]
 ---> Running in e163f47f4d8a
Removing intermediate container e163f47f4d8a
 ---> 1f30c749e8b9
Successfully built 1f30c749e8b9
Successfully tagged local/tiny-image:latest

[dbetz@ganymede tiny-image]$ docker run local/tiny-image
Hello, World

[dbetz@ganymede tiny-image]$ docker image ls | grep "local/tiny-image"
local/tiny-image                                    latest                                     1f30c749e8b9        15 seconds ago      848B

It builds and it runs, and the entire binary is 848 bytes.

The more functionality you add to your binary, the larger it grows. Your image size should remain somewhat proportional to your functionality. That's what you'd expect from a tarball, that's how you should think with Docker.

This means that you should be careful with what files go into your resulting binary. This means being careful with how you satisfy your application's dependency needs.

Would you really throw an entire Linux OS operating system into your tarball?

Practical scratch

In the previous assembler example, we had an application with zero dependencies. When this is your situation, your Docker image size will be very near your application size. You want them to be as close as possible.

One popular way to satisfy this need is to use Go: this can output statically linked binaries that require zero dependencies. Go has many places where it fits nicely. You can see, for example, my recursivecall for Docker project. Docker itself is also written in Go.

On the other hand, Go doesn't have deep support for dynamic types. This means you won't have the JavaScript/Python dynamic object concept. Instead, you'll have to refresh yourself on those data-structures we all forgot decades ago.

Regardless, while Go is beautiful for many uses, you already have applications. Let's focus on deploying those application via Docker, not rewriting them in Go.

For this next section, let's assume that our appplication /var/app/runner requires /usr/lib64/libc.so.6. Our application will crash if it doesn't find /usr/lib64/libc.so.6.

In this situation, we have two options based on your understanding of Docker namespaces:

  1. copy /usr/lib64/libc.so.6 into the image with /var/app/runner

  2. link /usr/lib64/libc.so.6 from the host machine to your running application's namespace

The first option can be accomplished with a multi-stage build with a simple COPY from the first stage:

FROM ubuntu as os

FROM scratch

COPY ./runner /var/app/runner

COPY --from=os /usr/lib64/libc.so.6 /lib64/

CMD ["/var/app/runner"]

Remember, the Ubuntu stage will be thrown out, but, yeah, you should still try to use Alpine where possible.

This will create a portable binary; everything the application needs will be within reach.

As your portability increases, so does your binary size. When you need more than just a few files and you must maintain portability (e.g. posting to Docker Hub), it's time to use the Docker Alpine file-system template.

However, in the case where your control your environment, thus don't require portability, the second approach may be better.

It allows you to provide a much simpler Dockerfile:

FROM scratch

COPY ./runner /var/app/runner

CMD ["/var/app/runner"]

In stead of copying the dependency into the image, you tell Docker at run-time to use a different namespace to satisy the dependency.

Your build and run would look like this:

docker build . -t local/runner
docker run -v /usr/lib64/libc.so.6:/lib64/libc.so.6 local/runner

If you don't want to play around with each and every file, just map the entire /lib64/ folder.

docker run -v /lib64/:/lib64/ local/runner

Since most libraries are loaded from /lib64/, this technique will account for a large percentage of your scenarios.

Practical scratch with Node

Let's make this more real-world by manually building a Docker Node binary which our deliverable Docker application binaries will use.

Here's our Dockerfile:

FROM alpine

RUN apk add --no-cache curl && \
    mkdir -p /tmp/node && \
    mkdir -p /tmp/etc && \
    curl -s https://nodejs.org/dist/v8.11.2/node-v8.11.2-linux-x64.tar.xz | tar -Jx -C /tmp/node/

RUN addgroup -g 500 -S nodeuser && \
    adduser -u 500 -S nodeuser -G nodeuser

RUN grep nodeuser /etc/passwd > /tmp/etc/passwd && \
    grep nodeuser /etc/group > /tmp/etc/group

FROM scratch

COPY --from=0 /bin/sh /tmp/node/node-v8.11.2-linux-x64/bin/node /bin/
COPY --from=0 /usr/bin/env /usr/bin/
COPY --from=0 /tmp/etc/passwd /tmp/etc/group /etc/

CMD  ["/bin/node"]

The resulting deliverable Docker binary will contain the node binary, passwd/group, and env (as an example of copying something you may need in Node development).

The first stage downloads Node, creates a user and group, then simplifies /etc/passwd and /etc/group. Only the final stage represents the deliverable binary.

Build and run:

docker build . -t local/node8
docker run -it local/node8

Building and running results in the following error:

standard_init_linux.go:195: exec user process caused "no such file or directory"

Run it again with the mapping:

docker run -it -v /lib64/:/lib64/ local/node8

It works.

[dbetz@ganymede node8]$ docker run -it -v /lib64/:/lib64/ local/node8 node
>

Let's check the application version:

[dbetz@ganymede node8]$ docker run -it -v /lib64/:/lib64/ local/node8 node -v
v8.11.2

What's our size?

[dbetz@ganymede ~]$ docker image ls | grep "local/node8"
local/node8                                           latest                                     901c2740deb9        12 seconds ago        36.4MB

It's 36.4MB. Pretty small.

Your Docker application binary will only contain 36.4MB of overhead when you ship your product.

Using our binary

With the Docker Node image built, we can build our deliverable application binary.

FROM node:8.11.2-alpine as swap-space

WORKDIR /var/app

COPY package.json /var/app/

RUN npm install

COPY . /var/app

FROM local/node8

WORKDIR /var/app

COPY --from=swap-space /var/app/ /var/app/

ENV PORT=3000

USER nodeuser:nodeuser

ENTRYPOINT ["node", "server.js"]

The first stage will use an official Docker Node binary to prepare our application. The second stage merely copies the application in. NPM isn't needed for your application to run. It only needs the application folder consisting of your code and node_modules/.

Build it and push it out (real-world example):

TAG=`date +%F_%H-%M-%S`
docker build . -t local/docker-sample-project:$TAG -t registry.gitlab.com/davidbetz/docker-sample-project:$TAG

docker push registry.gitlab.com/davidbetz/docker-sample-project:$TAG

/etc/passwd and /etc/group

The addition of /etc/passwd and /etc/group are artifacts of how most Linux tools work: they want a name, not UID or GID. You create a user and group just to name them. You can't simply specify UID 500.

Because /etc/passwd and /etc/group are part of a file-system in a specific namespace, tools use the files within the file-system they're looking at to do the ID to name lookup.

This gives us an opportunity to do an experiment...

Let's run MongoDB in the background:

[dbetz@ganymede ~]$ docker run -d mongo
8e4be179031cc7221e561934df02c513cb9f7b3946343e0a2b027dace5f83f03

Let's execute sh in that namespace:

Remember, you aren't going into a container, there is no container. But, it's still phenomenological language like "the sun rises". You'll end up talking about "containers", but remember they're merely abstractions.

[dbetz@ganymede ~]$ docker exec -it 8e4b sh
#

Let's see the processes from the perspective of that namespace:

# ps aux
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
mongodb       1  7.0  0.6 984072 55524 ?        Ssl  18:43   0:00 mongod --bind_

OK, so the user for mongod is mongodb. Let's get the UID and GID for mongodb:

# grep mongodb /etc/passwd
mongodb:x:999:999::/home/mongodb:/bin/sh
# grep mongodb /etc/group
mongodb:x:999:

It's 999.

Now exit sh and look for mongod in your processes on your host machinnne:

[dbetz@ganymede ~]$ ps aux | grep mongod
polkitd   40754  0.6  0.7 990396 58628 ?        Ssl  13:43   0:02 mongod --bind_ip_all

The user is polkitd, not mongodb.

Why polkitd?

Well, look for user 999 in your /etc/passwd file:

[dbetz@ganymede ~]$ grep 999 /etc/passwd
polkitd:x:999:997:User for polkitd:/:/sbin/nologin

ps saw 999 and used the /etc/passwd within reach to do the lookup, thus interpreted it as polkitd.

Key Takeaways

  • Docker uses existing Linux functionality.
  • There is no "subsystem" or hypervisor.
  • Images do not contain operating systems.
  • Docker images are merely Docker binaries of your application.
  • Your Docker images should not contain anything other than your application and it's dependencies.
  • Images are either from scratch or based on a file-system template like Docker Alpine.
  • You can map files already on your system's file-system to minimize the size of images.

Using Azure App Services with Node.js

We often hear how Azure is "Microsoft" whereas other cloud providers aren't. In the most obvious sense, they're right -- Microsoft owns it. However, when you look closer, what they actually mean is that Azure is Microsoft-only and Google/AWS are open to other programming models.

This is ridiculous and only said by people gulping down electrolyte-loaded propaganda by the water cooler. In reality, there's nothing proprietary or Microsoft-only about Azure as a whole. It's a nonsensical bias to say otherwise. Azure is platform agnostic. You don't need a VM (=IAAS) to do your Node (or Python development). Absolutely zero hacks are required to put your Node application in Azure App Services (=PAAS).

It reminds me of when HTML5 became popular: there were still Flash-zealots pushing their "LOL browsers can't do animation nonsense LOL". They forgot to leave their echo chamber before attempting to entering reality.

Software engineers steeped in Microsoft technologies for over a decade understand that you must make a distinction between Ballmer and non-Ballmer Microsoft. To give an extreme contrast: the former is VB6, the latter is Linux in Windows. You can see the transition from Ballmer-nonsense to non-Ballmer-sanity since around .NET 4 (especially in the adoption of ASP.NET MVC as an open-source tool). Ballmer is Silverlight, non-Ballmer is adoption-of-HTML5. You can go on down the line.

Yes, Microsoft still has some war crime-level trash software, .NET Core being one of the brightest shining examples, but viewed in the light of the fact that just about everything Oracle touches is trash -- they're doing pretty well in the brave new world.

In the end: Ballmer-Microsoft is Microsoft-as-evil-empire. Today's Microsoft is fully amiable toward Linux; they also rely on Github for many SDKs and for just about all Azure documentation. It's a different beast.

You need to examine Azure through this Ballmer / non-Ballmer paradigm. Put concretely: Windows Azure (and Azure ASM) was Ballmer whereas Microsoft Azure (and Azure ARM) is non-Ballmer. Much of the "LOL Azure is Microsoft-only LOL" nonsense comes from confusion about the transition between Azure "versions". Much of this is also Microsoft's fault: just about all the books out there are completely obsolete! The official-book for the 70-532 exam will absolutely guarantee that you fail the exam.

For the topic at hand, we shouldn't look at Azure in an ad hoc manner, but in the context of it's intimately related technologies. Specifically we need to look at the development of IIS as it passed from a Ballmer to a non-Ballmer implementation.

Working with IIS

My life with IIS started around the IIS3 era. I still remember taking the IIS3 exam as an elective (with the TCP/IP exam) for my NT4 MCSE. Thus, I've seen the various large upgrades and incremental updates over a good stretch of time.

The upgrade from IIS6 to IIS7 was easily the largest IIS upgrade; it laid the groundwork for eventually stripping out the last vestige of Ballmerisms via its flexible APIs. Till IIS7, the biggest upgrade was a silly configuration system update (=IIS4 metabase). The IIS7 upgrade consisted of a systematic, paradigmatic shift. It was the "classical" to the "integrated" pipeline upgrade. The upgrade to so deep, that you literally had to update your applications to add IIS7 support. After a while, all development was IIS7-first with IIS6 backwards compatibility added subsequently.

In practice, this classic -> integrated upgrade meant three things: First, instead of relying on the external ASP.NET ISAPI IIS plug-in, ASP.NET processing was integrated into IIS. No more interop. This made ASP.NET development more natural. It also gave .NET access to core extensibility functionality in IIS. You didn't need to whip out C++ for server extensibility. Second, if you had existing C++ functionality, you had easier access to IIS functionality with the new native IIS API. This second point is critical, because we see that the IIS7 upgrade wasn't just about .NET. Third, web.config was no longer about ASP.NET, but about IIS itself. This point is huge and points to the fact that the web.config format controls all over IIS7+, as seen in the global applicationHost.config file.

IIS6 used the rediculous ISAPI nonsense to do just about everything, including call ASP.NET. The .aspx extension was simply mapped to aspnet_isapi.dll. This wasn't removed from IIS7; it was just separated and called "classic" mode.

In this a IIS7 world, this meant that you literally had to add handler / module support for both IIS6 and IIS7 (more accurately, the classical and integrated models).

Furthermore, the low-level ASP.NET pipeline APIs were also affected. For my deeply low-level Themelia framework, I had to make checks between completely different pipelines. See the following snippet from my CoreModule (a typical module implementing the System.Web.IHttpModule interface):

View Themelia at Themelia Pro. View Themelia source at Themelia on Gitlab

CoreModule.cs:

    
    if (HttpRuntime.UsingIntegratedPipeline)
    {
        httpApplication.PostResolveRequestCache += OnProcessRoute;
        httpApplication.PostMapRequestHandler += OnSetHandler;
    }
    else
    {
        httpApplication.PostMapRequestHandler += OnProcessRoute;
        httpApplication.PostMapRequestHandler += OnSetHandler;
    }


Reference: CoreModule.cs

The installation was also different between IIS6 ("classical") and IIS7 ("integrated"):

For IIS6, I would add the module to system.web:


    <system.web>
        <httpModules>
            <add name="Themelia" type="Themelia.Web.CoreModule, Themelia.Web"/>
        </httpModules>
    </system.web>


For IIS7, I would add the module to system.webServer:


    <system.webServer>
        <modules>
            <remove name="Session"/>
            <add name="Session" type="System.Web.SessionState.SessionStateModule" preCondition=""/>
            <add name="Themelia" type="Themelia.Web.CoreModule, Themelia.Web"/>
        </modules>
    </system.webServer>


There is also the much more popular concept of a handler. My framework was meant to be a full IIS6-era platform takeover, so I used a more greedy module, but if you're only doing specific framework development, handlers are your choice. ASP.NET MVC, for example, uses handler.

ASP.NET MVC is actually an excellent example. Assuming ASP.NET was properly installed, for IIS7, they were able to take advantage of the fact that IIS7 processed everything (e.g. /contact) as .NET (though you still needed runAllManagedModulesForAllRequests enabled to disable that slight perf boost). For IIS6, because it had to know when to call the ASP.NET ISAPI filter, you had to add a wildcard handling to get the ISAPI filter handle extenstionless paths (again, e.g. /contact).


    <system.webServer>
        <handlers>
            <add verb="*" path=".png" name="WatermarkHandler" type="WatermarkHandler"/>
        </handlers>
    </system.webServer>

Handlers and modules are still the standard was of tapping into the stream of raw power.

.NET is powerful. C# even has an unmanaged mode where you can crack open the covers (via unsafe mode) to do direct *pointer &manipulation. That said, the upgrade to IIS7 wasn't just about .NET; the upgrade provided a native IIS API as well.

Thus we enter the realm of C/C++ modules: Develop a Native C\C++ Module for IIS 7.0

By removing the ISAPI barrier and providing a clean, native IIS API C++ developers could more easily connect existing C++ functionality to IIS. It also made ASP.NET C++ code more expressive; familiar web terms like HttpContext, IHttpResponse, and BeginRequest (and other events) are all over IIS C++ code. No more DWORD WINAPI HttpExtensionProc(EXTENSION_CONTROL_BLOCK *pECB) nonsense.

Seriously. Review the C++ ISAPI docs. They're insane. 1990s Microsoft C++ was the worst code ever written. It's just plain satanic.

Consider the following IIS7-esque native C++ method:


    HRESULT        
    __stdcall        
    RegisterModule(        
        DWORD                           dwServerVersion,    
        IHttpModuleRegistrationInfo *   pModuleInfo,
        IHttpServer *                   pHttpServer            
    )
    {
    }

That's exactly how you register your native modules in IIS7. That's not too terribly evil. You can see that it's registering a module, and bringing in points to core IIS entities.

This is also exactly how IIS handles Node hosting in Azure; it uses the iisnode module. You can see RegisterModule in main.cpp in iisnode:

https://github.com/tjanczuk/iisnode/blob/master/src/iisnode/main.cpp

If you review the following code from CProtocolBridge.cpp in iisnode, you'll see familiar things like IHttpContext and IHttpResponse:

https://github.com/tjanczuk/iisnode/blob/master/src/iisnode/cprotocolbridge.cpp

It's clean interface programming.

Using iisnode

IIS handles most of it's config with your applications web.config. While there are a few global config files, you get tremendous control with your own config.

Hosting a Node application in Azure is as simple as deploying an Azure Web App with a properly configured web.config.

You can following along with the following activities by deploying the following repo -> https://gitlab.com/davidbetz/template-azure-node-api.

Per the previous explanation of IIS modules, you can see from the following web.config that iisnode is installed just as we would install our own handlers and modules. There are no hacks whatsoever.

    
    <?xml version="1.0" encoding="utf-8"?>
    <configuration>
      <system.webServer>
        <!-- leave false, you enable support in Azure -->
        <webSocket enabled="false" />
        <handlers>
          <add name="iisnode" path="server.js" verb="*" modules="iisnode"/>
        </handlers>
        <rewrite>
          <rules>
            <rule name="StaticContent">
              <action type="Rewrite" url="content{REQUEST_URI}"/>
            </rule>
            <rule name="DynamicContent">
              <conditions>
                <add input="{REQUEST_FILENAME}" matchType="IsFile" negate="True"/>
              </conditions>
              <action type="Rewrite" url="server.js"/>
            </rule>
            <rule name="Redirect to https" stopProcessing="true">
              <match url="(.*)" />
              <conditions>
                <add input="{HTTPS}" pattern="off" ignoreCase="true" />
              </conditions>
              <action type="Redirect" url="https://{HTTP_HOST}{REQUEST_URI}" redirectType="Permanent" appendQueryString="false" />
            </rule>
          </rules>
        </rewrite>
        <security>
          <requestFiltering>
            <hiddenSegments>
              <remove segment="bin"/>
            </hiddenSegments>
          </requestFiltering>
        </security>
        <httpErrors existingResponse="PassThrough" />
      </system.webServer>
    </configuration>

The following section listens for any requests for all verbs accessing server.js and has iisnode process them:


    <handlers>
        <add name="iisnode" path="server.js" verb="*" modules="iisnode"/>
    </handlers>

The following rewrite rule sends all traffic to server.js:


    <rule name="DynamicContent">
        <conditions>
            <add input="{REQUEST_FILENAME}" matchType="IsFile" negate="True"/>
        </conditions>
        <action type="Rewrite" url="server.js"/>
    </rule>

The following doesn't have anything directly to with iisnode; it excludes the content folder from iisnode processing:


    <rule name="StaticContent">
        <action type="Rewrite" url="content{REQUEST_URI}"/>
    </rule>

I find putting static files on your web server to be naive, but if you really don't want to use the Azure CDN, that is how you host static content.

The following merely redirects HTTP to HTTPS:


    <rule name="HTTP to HTTPS redirect" stopProcessing="true">
        <match url="(.*)" />
        <conditions>
            <add input="{HTTPS}" pattern="off" ignoreCase="true" />
        </conditions>
        <action type="Redirect" url="https://{HTTP_HOST}/{REQUEST_URI}" redirectType="Permanent" />
    </rule>

Breaking iisnode

To prove that Node hosting is actually this basic, let's break it, then fix it.

First, let's see this work:

https://node38eb089b-app-alpha.azurewebsites.net/api/samples

it works

That's the expected output from the application.

Now, let's go to web.config and break it:


  <rule name="DynamicContent">
    <conditions>
      <add input="{REQUEST_FILENAME}" matchType="IsFile" negate="True"/>
    </conditions>
    <action type="Rewrite" url="server.js"/>
  </rule>

Change that Rewrite from server.js to server2.js:

Save. Refresh browser.

404

Nope.

Go to Kudu. This is the .scm. URL. In this case it's the following:

https://node3638972b-app-alpha.scm.azurewebsites.net/

kudu

Rename server.js to server2.js:

rename server.js to server2.js

Refresh again.

rawcodeoutput

mmmk. Raw output.

The rewrite is telling everything to go to server.js, but nothing is processing it, so it just sends the file back.

This is exactly like accessing an old .aspx page and getting the raw ASP.NET webform code, because you forgot to install ASP.NET (and somehow managed to allow access to .aspx).

Now, let's fix this by telling our IIS module to process server2.js:


    <handlers>
      <add name="iisnode" path="server2.js" verb="*" modules="iisnode"/>
    </handlers>

Refresh and it's all well again:

it works

App Services and App Service Plan Mechanics

An explanation of Azure web apps using any web platform isn't complete without reviewing the mechanic of Azure web apps.

To begin, let's clarify a few Azure terms:

Azure App Service Plans are effectively managed VMs. You can scale these up and out. That is, you can turn an S1 into an S2 to double the RAM or you can turn a single instance into four. Because of this later ability, App Service Plans are also known as server farms. In fact, when developing ARM templates, the type you use to deploy an App Service Plan is Microsoft.Web/serverfarms.

You do not deploy a series of plans to create a farm. Plans are farms. Plans with a size of 1 is just a farm with 1 instance. You are always dealing with herds, you are never dealing with pets. You scale your farm out, you scale all those instances up.

Azure Web Apps are also known as Web Sites and App Services. You deploy these, you back these up, and you add SSL to these. These are similar to IIS virtual applications. When developing ARM templates, the type is Microsoft.Web/sites.

You do need to remember the various synonyms for each; you will see them all.

Given this distinction and given the fact that a VM can have multiple IIS applications, you can imagine that you can host multiple Azure Web Apps on a single App Service Plan. This is true. You do NOT deploy a plan every time you deploy a site. You plan your CPU/RAM usage capacity ahead of time and deploy a full solution at once.

To visualize the App Service / App Service Plan distinction, review the following image.

Here I've provide information for three services over two service plans. The first two services share a service plan, the third service is on a different plan.

Notice that the services with the same service plan have the same machine ID and instance ID, but their IIS specifics are different. The third service plan has a different machine ID altogether.

What's so special about the types of Web Apps?

If this is all just the same IIS, what's with the various Node-specific web app types?

web app types

The answer is simple: they exist solely to confuse you.

Fine. Whatever. The different types are hello-world templates, but you're going to overwrite them via deployment anyway.

You can literally deploy a Node.js web app, then deploy and ASP.NET site on it. It's just IIS. The website deployment will overwrite the web.config with its own.

Given the previous explanations of IIS handler/modules, iisnode-as-module, and the service/plan distinction, you can see that there's no magic. There's nothing Microsoft-only about any of this.

You can always use the normal "Web App" one and be done with it.

Single App Solutions

My websites are generally ASP.NET or Python/Django, but my APIs are always Node (ever since someone at Microsoft with an IQ of a Pennsylvania speed limit decided to deprecate Web API and rebuild it as "MVC" in ASP.NET Core). There was a time when my APIs and my website required separate... just about everything. Now adays I use nginx as a single point of contact to serve traffic from various internal sources: one source to handle the website as a whole (either http://127.0.0.1:XXXX or a Linux socket) and another to handle /api. This lets me use a single domain (therefore a single SSL cert) for my solution.

In Azure, this functionality is provided by Application Gateway.

Think back through all the mechanics we've covered so far: IIS can handle .NET and supports modules. iisnode is a module. IIS uses rewriting to send everything to server.js. iisnode handles all traffic sent to server.js.

Let's mix this up: instead of rewriting everything to server.js, let's only rewrite the /api branch of our URL.

To make this example a bit spicier, let's deploy an ASP.NET MVC application to our App Service, then send /api to Node.

To do this, go to the App Service (not the App Service Plan!), then select Deployment Options on the left.

external

In Choose source, select External Repository and put in the following:

https://github.com/Azure-Samples/app-service-web-dotnet-get-started

external

A few minutes later, load the application normally. You'll see the "ASP.NET is a free web framework for..." propaganda.

Now go back into Kudu and the PowerShell Debug Console (explained earlier).

We need to do three things:

  • add our server.js
  • install express
  • tell web.config about server.js

To add server.js, go to site/wwwroot and type the following:

touch server.js

This will create the file. Edit the file and paste in server from the following:

https://gitlab.com/davidbetz/template-azure-node-api/blob/master/server.js

Next we need to install express to handle the API processing. H

For the sake of a demo, type the following:

npm install express

Done.

For the sake of your long-term sanity, create package.json (same way you created server.js), edit in contents, save, then run:

npm install

See sample package.json at:

https://gitlab.com/davidbetz/template-azure-node-api/blob/master/package.json

Finally, edit web.config.

You need the splice in the following config in the system:


  <system.webServer>
    <rewrite>
      <rules>
        <rule name="DynamicContent">
          <match url="^api/(.*)" />
          <action type="Rewrite" url="server.js"/>
        </rule>
      </rules>
    </rewrite>
    <handlers>
      <add name="iisnode" path="server.js" verb="*" modules="iisnode"/>
    </handlers>
  </system.webServer>

Upon saving, access the web app root and /api/samples. Click around the web app to prove to yourself that it's not just a static page.

asp.net and node together

You have ASP.NET and Node.js in the same Azure web app. According to a lot of the FUD out there, this shouldn't be possible.

In addition to hosting your SPA and your APIs in the same place, you also don't need to play with CORS nonsense. You also don't need an Application Gateway (=Microsoft's nginx) to do the same thing

Scenario: Deploy Multi-Region Elasticsearch Cluster on Azure

Previously, I wrote about my proposal for modular ARM templates. Here I'm going to go through the scenario of deploying a full, multi-location Elasticsearch node cluster with a single public endpoint and pretty DNS.

This is a sequel to Developing Azure Modular ARM Templates. Study that first. You need the Deploy ARM Template PowerShell at the bottom to complete this deployment.

Let's begin by reviewing the solution:

This solution is at https://gitlab.com/davidbetz/azure-elasticsearch-nodes-scenario

solution files

Here we see:

  • A 3-phase Azure deployment
  • 3 PowerShell files
  • A VM setup script (install.sh)
  • A script that install.sh will run (create_data_generation_setup.sh)
  • A file that create_data_generation_setup.sh will indirectly use (hamlet.py)

The modular deployment phases are:

  • Setup of general resources (storage, vnet, IP, NIC, NSG)
  • Setup of VNet gateways
  • Setup of VMs

The first phase exists to lay out the general components. This is a very quick phase.

The second phase creates the gateways.

After the second phase, we will create the VNet connections with PowerShell

The third phase will create the VMs

Then we will generate sample data and test the ES cluster.

Then we will create an Azure Traffic Manager and test.

Finally we will add a pretty name to the traffic manager with Azure DNS.

Let's do this...

Creating Storage for Deployment

The first thing we're going to do is create a storage account for deployment files. In various examples online, you see this in Gitlab. We're not going to do that. Here we create a storage account that future phases will reference. You only need one... ever. Here I'm creating one just for this specific example deployment.


    $uniquifier = $([guid]::NewGuid().tostring().substring(0, 8))
    $rg = "esnodes$uniquifier"
    _createdeploymentAccount -rg $rg -uniquifier $uniquifier

Reference the Deploy ARM Template PowerShell file at the end of Developing Azure Modular ARM Templates for the above and later code.

Output

VERBOSE: Performing the operation "Replacing resource group ..." on target "".
VERBOSE: 11:01:28 PM - Created resource group 'esnodesfbdac204' in location 'centralus'


ResourceGroupName : esnodesfbdac204
Location          : centralus
ProvisioningState : Succeeded
Tags              : 
ResourceId        : /subscriptions/20e08d3d-d5c5-4f76-a454-4a1b216166c6/resourceGroups/esnodesfbdac204


ResourceGroupName      : esnodesfbdac204
StorageAccountName     : filesfbdac204
Id                     : /subscriptions/20e08d3d-d5c5-4f76-a454-4a1b216166c6/resourceGroups/esnodesfbdac204/providers/Microsoft.Storage/storageAccounts/filesfbdac204
Location               : centralus
Sku                    : Microsoft.Azure.Management.Storage.Models.Sku
Kind                   : BlobStorage
Encryption             : 
AccessTier             : Hot
CreationTime           : 7/22/2017 4:01:29 AM
CustomDomain           : 
Identity               : 
LastGeoFailoverTime    : 
PrimaryEndpoints       : Microsoft.Azure.Management.Storage.Models.Endpoints
PrimaryLocation        : centralus
ProvisioningState      : Succeeded
SecondaryEndpoints     : 
SecondaryLocation      : 
StatusOfPrimary        : Available
StatusOfSecondary      : 
Tags                   : {}
EnableHttpsTrafficOnly : False
Context                : Microsoft.WindowsAzure.Commands.Common.Storage.LazyAzureStorageContext
ExtendedProperties     : {}


CloudBlobContainer : Microsoft.WindowsAzure.Storage.Blob.CloudBlobContainer
Permission         : Microsoft.WindowsAzure.Storage.Blob.BlobContainerPermissions
PublicAccess       : Blob
LastModified       : 7/22/2017 4:02:00 AM +00:00
ContinuationToken  : 
Context            : Microsoft.WindowsAzure.Commands.Storage.AzureStorageContext
Name               : support

Phase 1 Deployment

Now we're ready for phase 1 of deployment. This phase is quick and easy. It simply creates the basic components that future phases will use.

Part of what phase 1 will deploy is a virtual network per region that we're requesting. In this example, we have "central us", "west us", and "east us". We need a virtual network in each.

But, to make this work we have to remember our IP addressing:

Very short IP address review

Given a 10.x.x.x network, and a 0.y.y.y mask, 10 is your network and the x.x.x is your host area.

Given a 10.1.x.x network, and a 0.0.y.y mask, 10.1 is your network and x.x is your mask.

The concept of subnetting is relative to the network and only shows up in discussions of larger scale networks and supernetting. Tell the pendantic sysadmins to take a hike when they they to confuse you with over emphasising the network vs. subnetwork aspects. This is a semantic concept, not a technical one. That is, it relates to the design, not the bits themselves.

The virtual networks in our modular deployment uses the following addressSpace:


    "addressSpace": {
        "addressPrefixes": [
            "[concat('10.', mul(16, add(copyIndex(), 1)), '.0.0/12')]"
        ]
    },

We see that our networks follow a 10.16*n+1.0.0/12 pattern.

This takes n to generate networks: n=0 => 10.16.0.0, n=1 => 10.32.0.0, and n=2 => 10.48.0.0.

Azure allows you to split your networks up into subnets as well. This is great for organization. Not only that, when you specify a NIC, you put it on a subnet. So, let's look at our subnet configuration:

    
    "subnets": [
        {
            "name": "subnet01",
            "properties": {
                "addressPrefix": "[concat('10.', add(mul(16, add(copyIndex(), 1)), 1), '.0.0/16')]"
            }
        },
        {
            "name": "subnet02",
            "properties": {
                "addressPrefix": "[concat('10.', add(mul(16, add(copyIndex(), 1)), 2), '.0.0/16')]"
            }
        },
        {
            "name": "GatewaySubnet",
            "properties": {
                "addressPrefix": "[concat('10.', mul(16, add(copyIndex(), 1)), '.0.', 16,'/28')]"
            }
        }
    ]

The NICs for our VMs will be on subnet01. We will not be using subnet02, but I always include it for future experiments and as an example of further subnetting.

GatewaySubnet is special and is used only by the VPN gateways. Don't mess with that.

Zooming into subnet01, we see a 10.(16*n+1)+1.0.0/16 pattern. It's basically the network + 1 with the next four bits defining the subnet (in our case the subnet is the network; it's only a subnet from the perspective of the network, but we're not viewing it from that perspective).

This takes n to generate networks: n=0 => 10.17.0.0, n=1 => 10.33.0.0, and n=2 => 10.49.0.0.

End of mini-lesson.

Now to deploy phase 1...

Output

(filtering for phase 1)
Merging \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\network\networkInterfaces\nic-copyIndex.json...
Merging \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\network\networkSecurityGroups\nsg-copyIndex.json...
Merging \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\network\publicIPAddresses\pip-copyIndex.json...
Merging \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\network\virtualNetworks\vnet-copyIndex.json...
Merging \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\storage\storageAccounts\storage-copyIndex.json...
(excluding \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\network\publicIPAddresses\2.pip-gateway-copyIndex.json)
(excluding \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\network\virtualNetworkGateways\2.gateway-copyIndex.json)
(excluding \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\compute\virtualMachines\3.vm-copyIndex.json)
(excluding \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\compute\virtualMachines\extensions\3.script.json)
------------------------------------

------------------------------------
Creating \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\deploy\07212017-110346.1...
Deploying template \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\deploy\azuredeploy-generated.json
VERBOSE: Performing the operation "Creating Deployment" on target "esnodesfbdac204".
VERBOSE: 07:03:49 PM - Template is valid.
VERBOSE: 07:03:51 PM - Create template deployment 'elasticsearch-secure-nodes07212017-110346'
VERBOSE: 07:03:51 PM - Checking deployment status in 5 seconds
VERBOSE: 07:03:56 PM - Checking deployment status in 5 seconds
VERBOSE: 07:04:01 PM - Checking deployment status in 5 seconds
VERBOSE: 07:04:06 PM - Resource Microsoft.Storage/storageAccounts 'esnodesfbdac204alpha' provisioning status is running
VERBOSE: 07:04:06 PM - Resource Microsoft.Network/networkSecurityGroups 'nsg-gamma' provisioning status is running
VERBOSE: 07:04:06 PM - Checking deployment status in 5 seconds
VERBOSE: 07:04:12 PM - Resource Microsoft.Network/publicIPAddresses 'pip-gamma' provisioning status is succeeded
VERBOSE: 07:04:12 PM - Resource Microsoft.Network/publicIPAddresses 'pip-beta' provisioning status is succeeded
VERBOSE: 07:04:12 PM - Resource Microsoft.Network/publicIPAddresses 'pip-alpha' provisioning status is succeeded
VERBOSE: 07:04:12 PM - Resource Microsoft.Network/virtualNetworks 'vnet-gamma' provisioning status is succeeded
VERBOSE: 07:04:12 PM - Resource Microsoft.Network/virtualNetworks 'vnet-alpha' provisioning status is succeeded
VERBOSE: 07:04:12 PM - Resource Microsoft.Network/virtualNetworks 'vnet-beta' provisioning status is succeeded
VERBOSE: 07:04:12 PM - Resource Microsoft.Network/networkSecurityGroups 'nsg-alpha' provisioning status is running
VERBOSE: 07:04:12 PM - Resource Microsoft.Storage/storageAccounts 'esnodesfbdac204gamma' provisioning status is running
VERBOSE: 07:04:12 PM - Resource Microsoft.Network/networkSecurityGroups 'nsg-beta' provisioning status is running
VERBOSE: 07:04:12 PM - Resource Microsoft.Storage/storageAccounts 'esnodesfbdac204beta' provisioning status is running
VERBOSE: 07:04:12 PM - Checking deployment status in 5 seconds
VERBOSE: 07:04:17 PM - Checking deployment status in 5 seconds
VERBOSE: 07:04:22 PM - Resource Microsoft.Network/networkInterfaces 'nic-alpha' provisioning status is succeeded
VERBOSE: 07:04:22 PM - Resource Microsoft.Network/networkInterfaces 'nic-gamma' provisioning status is succeeded
VERBOSE: 07:04:22 PM - Resource Microsoft.Network/networkInterfaces 'nic-beta' provisioning status is succeeded
VERBOSE: 07:04:22 PM - Resource Microsoft.Network/networkSecurityGroups 'nsg-alpha' provisioning status is succeeded
VERBOSE: 07:04:22 PM - Resource Microsoft.Network/networkSecurityGroups 'nsg-beta' provisioning status is succeeded
VERBOSE: 07:04:22 PM - Resource Microsoft.Network/networkSecurityGroups 'nsg-gamma' provisioning status is succeeded
VERBOSE: 07:04:22 PM - Checking deployment status in 5 seconds
VERBOSE: 07:04:27 PM - Checking deployment status in 5 seconds
VERBOSE: 07:04:33 PM - Checking deployment status in 5 seconds
VERBOSE: 07:04:38 PM - Resource Microsoft.Storage/storageAccounts 'esnodesfbdac204gamma' provisioning status is succeeded
VERBOSE: 07:04:38 PM - Resource Microsoft.Storage/storageAccounts 'esnodesfbdac204beta' provisioning status is succeeded
VERBOSE: 07:04:38 PM - Resource Microsoft.Storage/storageAccounts 'esnodesfbdac204alpha' provisioning status is succeeded
VERBOSE: 07:04:38 PM - Checking deployment status in 5 seconds


DeploymentName          : elasticsearch-secure-nodes07212017-110346
ResourceGroupName       : esnodesfbdac204
ProvisioningState       : Succeeded
Timestamp               : 7/23/2017 12:04:32 AM
Mode                    : Incremental
TemplateLink            : 
Parameters              : 
                          Name             Type                       Value     
                          ===============  =========================  ==========
                          admin-username   String                     dbetz     
                          ssh-public-key   String                     ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCxbo0LWWXCHEEGxgtraIhHBPPnt+kJGMjYMC6+9gBIsYz8R8bSFfge7ljHRxvJoye+4IrdSf2Ee2grgm2+xT9HjMvVR2/LQjPY+ocdinYHlM6miqvMgMblOMVm6/WwY0L
                          ZkozPKuSXzhO+/Q6HTZBr2pig/bclvJuFPBtClrzZx5R3NfV33/2rZpFZH9OdAf28q55jbZ1t9AJhtD27s34/cRVBXNBQtc2Nw9D8cEJ+raRdJitAOX3U41bjbrO1u3CQ/JtXg/35wZTJH1Yx7zmDl97cklfiArAfaxkgpWkGhob6A6Fu7LvEgLC25gO5NsY+g4CDqGJT5kzbcyQDDh
                          bf dbetz@localhost.localdomain
                          script-base      String                               
                          
Outputs                 : 
DeploymentDebugLogLevel : 

Now we have all kinds of goodies setup. Note how fast that was: template validation was at 07:03:49 PM and it finished at 07:04:38 PM.

post phase 1

Phase 2 Deployment

Now for phase 2. Here we're creating the VPN gateways. Why? Because we have multiple virtual networks in multiple regions. We need to create VPN connections between them to allow communication. To create VPN connections, we need VPN gateways.

Output

Be warned: this takes forever.

(filtering for phase 2)

Merging \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\network\networkInterfaces\nic-copyIndex.json...
Merging \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\network\networkSecurityGroups\nsg-copyIndex.json...
Merging \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\network\publicIPAddresses\pip-copyIndex.json...
Merging \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\network\virtualNetworks\vnet-copyIndex.json...
Merging \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\storage\storageAccounts\storage-copyIndex.json...
Merging \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\network\publicIPAddresses\2.pip-gateway-copyIndex.json...
Merging \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\network\virtualNetworkGateways\2.gateway-copyIndex.json...
(excluding \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\compute\virtualMachines\3.vm-copyIndex.json)
(excluding \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\compute\virtualMachines\extensions\3.script.json)
------------------------------------

------------------------------------
Creating \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\deploy\07222017-074129.2...
Deploying template \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\deploy\azuredeploy-generated.json
VERBOSE: Performing the operation "Creating Deployment" on target "esnodesfbdac204".
VERBOSE: 7:41:42 PM - Template is valid.
VERBOSE: 7:41:43 PM - Create template deployment 'elasticsearch-secure-nodes07222017-074129'
VERBOSE: 7:41:43 PM - Checking deployment status in 5 seconds
VERBOSE: 7:41:49 PM - Resource Microsoft.Network/networkSecurityGroups 'nsg-beta' provisioning status is succeeded
VERBOSE: 7:41:49 PM - Resource Microsoft.Network/networkSecurityGroups 'nsg-alpha' provisioning status is succeeded
VERBOSE: 7:41:49 PM - Resource Microsoft.Network/virtualNetworks 'vnet-gamma' provisioning status is succeeded
VERBOSE: 7:41:49 PM - Resource Microsoft.Storage/storageAccounts 'esnodesfbdac204alpha' provisioning status is succeeded
VERBOSE: 7:41:49 PM - Checking deployment status in 5 seconds
VERBOSE: 7:41:54 PM - Resource Microsoft.Network/networkInterfaces 'nic-gamma' provisioning status is succeeded
VERBOSE: 7:41:54 PM - Resource Microsoft.Network/networkInterfaces 'nic-alpha' provisioning status is succeeded
VERBOSE: 7:41:54 PM - Resource Microsoft.Network/publicIPAddresses 'pip-alpha' provisioning status is succeeded
VERBOSE: 7:41:54 PM - Resource Microsoft.Storage/storageAccounts 'esnodesfbdac204gamma' provisioning status is succeeded
VERBOSE: 7:41:54 PM - Resource Microsoft.Storage/storageAccounts 'esnodesfbdac204beta' provisioning status is succeeded
VERBOSE: 7:41:54 PM - Resource Microsoft.Network/publicIPAddresses 'pip-beta' provisioning status is succeeded
VERBOSE: 7:41:54 PM - Resource Microsoft.Network/publicIPAddresses 'pip-gamma' provisioning status is succeeded
VERBOSE: 7:41:54 PM - Resource Microsoft.Network/virtualNetworks 'vnet-beta' provisioning status is succeeded
VERBOSE: 7:41:54 PM - Resource Microsoft.Network/virtualNetworks 'vnet-alpha' provisioning status is succeeded
VERBOSE: 7:41:54 PM - Resource Microsoft.Network/networkSecurityGroups 'nsg-gamma' provisioning status is succeeded
VERBOSE: 7:41:54 PM - Checking deployment status in 5 seconds
VERBOSE: 7:41:59 PM - Resource Microsoft.Network/virtualNetworkGateways 'gateway-gamma' provisioning status is running
VERBOSE: 7:41:59 PM - Resource Microsoft.Network/publicIPAddresses 'pip-gateway-alpha' provisioning status is succeeded
VERBOSE: 7:41:59 PM - Resource Microsoft.Network/networkInterfaces 'nic-beta' provisioning status is succeeded
VERBOSE: 7:41:59 PM - Resource Microsoft.Network/publicIPAddresses 'pip-gateway-gamma' provisioning status is succeeded
VERBOSE: 7:41:59 PM - Resource Microsoft.Network/publicIPAddresses 'pip-gateway-beta' provisioning status is succeeded
VERBOSE: 7:41:59 PM - Checking deployment status in 10 seconds
VERBOSE: 7:42:09 PM - Resource Microsoft.Network/virtualNetworkGateways 'gateway-beta' provisioning status is running
VERBOSE: 7:42:09 PM - Resource Microsoft.Network/virtualNetworkGateways 'gateway-alpha' provisioning status is running
VERBOSE: 7:42:10 PM - Checking deployment status in 11 seconds

takes forever

VERBOSE: 8:11:11 PM - Checking deployment status in 11 seconds
VERBOSE: 8:11:22 PM - Checking deployment status in 11 seconds
VERBOSE: 8:11:33 PM - Checking deployment status in 10 seconds
VERBOSE: 8:11:43 PM - Checking deployment status in 5 seconds
VERBOSE: 8:11:48 PM - Checking deployment status in 9 seconds
VERBOSE: 8:11:58 PM - Resource Microsoft.Network/virtualNetworkGateways 'gateway-beta' provisioning status is succeeded
VERBOSE: 8:11:58 PM - Checking deployment status in 11 seconds
VERBOSE: 8:12:09 PM - Checking deployment status in 5 seconds
VERBOSE: 8:12:14 PM - Checking deployment status in 7 seconds
VERBOSE: 8:12:21 PM - Checking deployment status in 11 seconds
VERBOSE: 8:12:33 PM - Checking deployment status in 11 seconds
VERBOSE: 8:12:44 PM - Checking deployment status in 5 seconds
VERBOSE: 8:12:49 PM - Resource Microsoft.Network/virtualNetworkGateways 'gateway-alpha' provisioning status is succeeded
VERBOSE: 8:12:49 PM - Checking deployment status in 8 seconds
VERBOSE: 8:12:57 PM - Checking deployment status in 11 seconds
VERBOSE: 8:13:08 PM - Checking deployment status in 5 seconds
VERBOSE: 8:13:14 PM - Checking deployment status in 7 seconds
VERBOSE: 8:13:21 PM - Checking deployment status in 5 seconds
VERBOSE: 8:13:26 PM - Checking deployment status in 7 seconds
VERBOSE: 8:13:33 PM - Checking deployment status in 11 seconds
VERBOSE: 8:13:45 PM - Checking deployment status in 5 seconds
VERBOSE: 8:13:50 PM - Resource Microsoft.Network/virtualNetworkGateways 'gateway-gamma' provisioning status is succeeded


DeploymentName          : elasticsearch-secure-nodes07222017-074129
ResourceGroupName       : esnodesfbdac204
ProvisioningState       : Succeeded
Timestamp               : 7/23/2017 1:13:44 AM
Mode                    : Incremental
TemplateLink            : 
Parameters              : 
                          Name             Type                       Value     
                          ===============  =========================  ==========
                          admin-username   String                     dbetz     
                          ssh-public-key   String                     ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCxbo0LWWXCHEEGxgtraIhHBPPnt+kJGMjYMC6+9gBIsYz8R8bSFfge7ljHRxvJoye+4IrdSf2Ee2grgm2+xT9HjMvVR2/LQjPY+ocdinYHlM6miqvMgMblOMVm6/WwY0LZkozPKuSXzhO
                          +/Q6HTZBr2pig/bclvJuFPBtClrzZx5R3NfV33/2rZpFZH9OdAf28q55jbZ1t9AJhtD27s34/cRVBXNBQtc2Nw9D8cEJ+raRdJitAOX3U41bjbrO1u3CQ/JtXg/35wZTJH1Yx7zmDl97cklfiArAfaxkgpWkGhob6A6Fu7LvEgLC25gO5NsY+g4CDqGJT5kzbcyQDDhbf 
                          dbetz@localhost.localdomain
                          script-base      String                               
                          
Outputs                 : 
DeploymentDebugLogLevel : 

At this point the gateways have been created.

gws

Creating a VPN Connection Mesh

You can follow whip out various topologies to connect the networks (i.e. hub-and-spoke, point-to-point, etc) . In this case I'm going for a full mesh topology. This connects everyone directly to every one else. It's the most hardcore option.

Given that a connection is unidirectional, you each toplogical unit between areas require both a to and from directional connection. So, A->B and B->A for a 2-point mesh. For a 3-point mesh, it's all over the board. The formula everyone who goes through network engineering training memorizes is n*(n-1). So, for n=3, you have 3 * 2 (6) connections. For n=5, this is 20 connections. That's a lot, but there's no lame bottleneck from tunneling traffic through a central hub (=hub-and-spoke topology).

When creating Azure VPN connections, you specify a shared key. This just makes sense. There needs to be private passcode to enable them to trust each other. In this example, I'm cracking open ASP.NET to auto-generate a wildly complex password. This thing is crazy. Here are some of the passwords it spits out:

  • rT64%nr*#OX/CR)O
  • XwX3UamErI@D)>N{
  • Ej.ZHSngc|yenaiD
  • @*KUz|$#^Jvp-9Vb
  • _7q)6h6/.G;8C?U(

Goodness.

Anyway, on to the races...


    function createmesh { param([Parameter(Mandatory=$true)]$rg,
                                [Parameter(Mandatory=$true)]$key)
    
        function getname { param($id)
            $parts = $id.split('-')
            return $parts[$parts.length-1]
        }
    
        $gateways = Get-AzureRmVirtualNetworkGateway -ResourceGroupName $rg
    
        ($gateways).foreach({
            $source = $_
            ($gateways).foreach({
                $target = $_
                $sourceName = getname $source.Name
                $targetName = getname $target.Name
                if($source.name -ne $target.name) {
                    $connectionName = ('conn-{0}2{1}' -f $sourceName, $targetName)
                    Write-Host "$sourceName => $targetName"
                    New-AzureRmVirtualNetworkGatewayConnection -ResourceGroupName $rg -Location $source.Location -Name $connectionName `
                        -VirtualNetworkGateway1 $source `
                        -VirtualNetworkGateway2 $target `
                        -ConnectionType Vnet2Vnet `
                        -RoutingWeight 10 `
                        -SharedKey $key
                }
            })  
        })
    }
    function _virtualenv {
    
    Add-Type -AssemblyName System.Web
    $key = [System.Web.Security.Membership]::GeneratePassword(16,2)
    
    createmesh -rg $rgGlobal -key $key
    
    } _virtualenv

Output

beta => gamma
beta => alpha
gamma => beta
gamma => alpha
alpha => beta
alpha => gamma



Name                    : conn-beta2gamma
ResourceGroupName       : esnodesfbdac204
Location                : westus
Id                      : /subscriptions/20e08d3d-d5c5-4f76-a454-4a1b216166c6/resourceGroups/esnodesfbdac204/providers/Microsoft.Network/connections/conn-beta2gamma
Etag                    : W/"d9644745-03f4-461c-9efa-1477cd5e13d1"
ResourceGuid            : bfd89895-af3d-44ee-80c8-74413a18f6c4
ProvisioningState       : Succeeded
Tags                    : 
AuthorizationKey        : 
VirtualNetworkGateway1  : "/subscriptions/20e08d3d-d5c5-4f76-a454-4a1b216166c6/resourceGroups/esnodesfbdac204/providers/Microsoft.Network/virtualNetworkGateways/gateway-beta"
VirtualNetworkGateway2  : "/subscriptions/20e08d3d-d5c5-4f76-a454-4a1b216166c6/resourceGroups/esnodesfbdac204/providers/Microsoft.Network/virtualNetworkGateways/gateway-gamma"
LocalNetworkGateway2    : 
Peer                    : 
RoutingWeight           : 10
SharedKey               : oF[sa4n^sq)aIYSj
ConnectionStatus        : Unknown
EgressBytesTransferred  : 0
IngressBytesTransferred : 0
TunnelConnectionStatus  : []

Name                    : conn-beta2alpha
ResourceGroupName       : esnodesfbdac204
Location                : westus
Id                      : /subscriptions/20e08d3d-d5c5-4f76-a454-4a1b216166c6/resourceGroups/esnodesfbdac204/providers/Microsoft.Network/connections/conn-beta2alpha
Etag                    : W/"893dc827-a077-4003-b086-ec3aba8344ee"
ResourceGuid            : c8780c08-9678-4720-a90b-42a8509c059e
ProvisioningState       : Succeeded
Tags                    : 
AuthorizationKey        : 
VirtualNetworkGateway1  : "/subscriptions/20e08d3d-d5c5-4f76-a454-4a1b216166c6/resourceGroups/esnodesfbdac204/providers/Microsoft.Network/virtualNetworkGateways/gateway-beta"
VirtualNetworkGateway2  : "/subscriptions/20e08d3d-d5c5-4f76-a454-4a1b216166c6/resourceGroups/esnodesfbdac204/providers/Microsoft.Network/virtualNetworkGateways/gateway-alpha"
LocalNetworkGateway2    : 
Peer                    : 
RoutingWeight           : 10
SharedKey               : oF[sa4n^sq)aIYSj
ConnectionStatus        : Unknown
EgressBytesTransferred  : 0
IngressBytesTransferred : 0
TunnelConnectionStatus  : []

Name                    : conn-gamma2beta
ResourceGroupName       : esnodesfbdac204
Location                : eastus
Id                      : /subscriptions/20e08d3d-d5c5-4f76-a454-4a1b216166c6/resourceGroups/esnodesfbdac204/providers/Microsoft.Network/connections/conn-gamma2beta
Etag                    : W/"a99e47f4-fd53-4583-811f-a868d1c0f011"
ResourceGuid            : 50b8bc36-37b9-434f-badc-961266b19436
ProvisioningState       : Succeeded
Tags                    : 
AuthorizationKey        : 
VirtualNetworkGateway1  : "/subscriptions/20e08d3d-d5c5-4f76-a454-4a1b216166c6/resourceGroups/esnodesfbdac204/providers/Microsoft.Network/virtualNetworkGateways/gateway-gamma"
VirtualNetworkGateway2  : "/subscriptions/20e08d3d-d5c5-4f76-a454-4a1b216166c6/resourceGroups/esnodesfbdac204/providers/Microsoft.Network/virtualNetworkGateways/gateway-beta"
LocalNetworkGateway2    : 
Peer                    : 
RoutingWeight           : 10
SharedKey               : oF[sa4n^sq)aIYSj
ConnectionStatus        : Unknown
EgressBytesTransferred  : 0
IngressBytesTransferred : 0
TunnelConnectionStatus  : []

Name                    : conn-gamma2alpha
ResourceGroupName       : esnodesfbdac204
Location                : eastus
Id                      : /subscriptions/20e08d3d-d5c5-4f76-a454-4a1b216166c6/resourceGroups/esnodesfbdac204/providers/Microsoft.Network/connections/conn-gamma2alpha
Etag                    : W/"4dd2d765-4bb0-488f-9d28-dabbf618c28f"
ResourceGuid            : e9e4591f-998b-4318-b297-b2078409c7e9
ProvisioningState       : Succeeded
Tags                    : 
AuthorizationKey        : 
VirtualNetworkGateway1  : "/subscriptions/20e08d3d-d5c5-4f76-a454-4a1b216166c6/resourceGroups/esnodesfbdac204/providers/Microsoft.Network/virtualNetworkGateways/gateway-gamma"
VirtualNetworkGateway2  : "/subscriptions/20e08d3d-d5c5-4f76-a454-4a1b216166c6/resourceGroups/esnodesfbdac204/providers/Microsoft.Network/virtualNetworkGateways/gateway-alpha"
LocalNetworkGateway2    : 
Peer                    : 
RoutingWeight           : 10
SharedKey               : oF[sa4n^sq)aIYSj
ConnectionStatus        : Unknown
EgressBytesTransferred  : 0
IngressBytesTransferred : 0
TunnelConnectionStatus  : []

Name                    : conn-alpha2beta
ResourceGroupName       : esnodesfbdac204
Location                : centralus
Id                      : /subscriptions/20e08d3d-d5c5-4f76-a454-4a1b216166c6/resourceGroups/esnodesfbdac204/providers/Microsoft.Network/connections/conn-alpha2beta
Etag                    : W/"aafef4bf-d241-4cdd-88b7-b6ecd793a662"
ResourceGuid            : ef5bb61b-fcbe-4452-bf1f-b847f32dfa95
ProvisioningState       : Succeeded
Tags                    : 
AuthorizationKey        : 
VirtualNetworkGateway1  : "/subscriptions/20e08d3d-d5c5-4f76-a454-4a1b216166c6/resourceGroups/esnodesfbdac204/providers/Microsoft.Network/virtualNetworkGateways/gateway-alpha"
VirtualNetworkGateway2  : "/subscriptions/20e08d3d-d5c5-4f76-a454-4a1b216166c6/resourceGroups/esnodesfbdac204/providers/Microsoft.Network/virtualNetworkGateways/gateway-beta"
LocalNetworkGateway2    : 
Peer                    : 
RoutingWeight           : 10
SharedKey               : oF[sa4n^sq)aIYSj
ConnectionStatus        : Unknown
EgressBytesTransferred  : 0
IngressBytesTransferred : 0
TunnelConnectionStatus  : []

Name                    : conn-alpha2gamma
ResourceGroupName       : esnodesfbdac204
Location                : centralus
Id                      : /subscriptions/20e08d3d-d5c5-4f76-a454-4a1b216166c6/resourceGroups/esnodesfbdac204/providers/Microsoft.Network/connections/conn-alpha2gamma
Etag                    : W/"edf5f85a-0f7d-4883-8e45-433de9e045b2"
ResourceGuid            : 074c168c-1d42-4704-b978-124c8505a35b
ProvisioningState       : Succeeded
Tags                    : 
AuthorizationKey        : 
VirtualNetworkGateway1  : "/subscriptions/20e08d3d-d5c5-4f76-a454-4a1b216166c6/resourceGroups/esnodesfbdac204/providers/Microsoft.Network/virtualNetworkGateways/gateway-alpha"
VirtualNetworkGateway2  : "/subscriptions/20e08d3d-d5c5-4f76-a454-4a1b216166c6/resourceGroups/esnodesfbdac204/providers/Microsoft.Network/virtualNetworkGateways/gateway-gamma"
LocalNetworkGateway2    : 
Peer                    : 
RoutingWeight           : 10
SharedKey               : oF[sa4n^sq)aIYSj
ConnectionStatus        : Unknown
EgressBytesTransferred  : 0
IngressBytesTransferred : 0
TunnelConnectionStatus  : []

Phase 3 Deployment

Now to create the VMs...

In our scenario, it's really important to do this phase after creaitng the VPN connection mesh. During the VMs creation, Elasticsearch is automatically setup and the nodes will attempt to connect each other.

No mesh => no connection => you-having-a-fit.

During this deploy, you're going to see that everything from phases 1 and 2 is validated. That's just the idempotent nature of ARM template deployment.

Output

(filtering for phase 3)
Merging \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\network\networkInterfaces\nic-copyIndex.json...
Merging \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\network\networkSecurityGroups\nsg-copyIndex.json...
Merging \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\network\publicIPAddresses\pip-copyIndex.json...
Merging \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\network\virtualNetworks\vnet-copyIndex.json...
Merging \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\storage\storageAccounts\storage-copyIndex.json...
Merging \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\network\publicIPAddresses\2.pip-gateway-copyIndex.json...
Merging \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\network\virtualNetworkGateways\2.gateway-copyIndex.json...
Merging \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\compute\virtualMachines\3.vm-copyIndex.json...
Merging \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\template\resources\compute\virtualMachines\extensions\3.script.json...
Uploading elasticsearch-secure-nodes\07222017-084857\create_data_generation_setup.sh
Uploading elasticsearch-secure-nodes\07222017-084857\install.sh
Uploading elasticsearch-secure-nodes\07222017-084857\generate\hamlet.py
Blob path: https://filesfbdac204.blob.core.windows.net/support/elasticsearch-secure-nodes/07222017-084857
------------------------------------

------------------------------------
Creating \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\deploy\07222017-084857.3...
Deploying template \\10.1.20.1\dbetz\azure\components\elasticsearch-secure-nodes\deploy\azuredeploy-generated.json
VERBOSE: Performing the operation "Creating Deployment" on target "esnodesfbdac204".
VERBOSE: 8:49:02 PM - Template is valid.
VERBOSE: 8:49:03 PM - Create template deployment 'elasticsearch-secure-nodes07222017-084857'
VERBOSE: 8:49:03 PM - Checking deployment status in 5 seconds
VERBOSE: 8:49:08 PM - Resource Microsoft.Network/publicIPAddresses 'pip-gateway-alpha' provisioning status is succeeded
VERBOSE: 8:49:08 PM - Resource Microsoft.Network/publicIPAddresses 'pip-gamma' provisioning status is succeeded
VERBOSE: 8:49:08 PM - Resource Microsoft.Network/networkSecurityGroups 'nsg-alpha' provisioning status is succeeded
VERBOSE: 8:49:08 PM - Resource Microsoft.Network/publicIPAddresses 'pip-gateway-gamma' provisioning status is succeeded
VERBOSE: 8:49:08 PM - Resource Microsoft.Network/virtualNetworks 'vnet-alpha' provisioning status is succeeded
VERBOSE: 8:49:08 PM - Resource Microsoft.Network/virtualNetworks 'vnet-beta' provisioning status is succeeded
VERBOSE: 8:49:08 PM - Resource Microsoft.Storage/storageAccounts 'esnodesfbdac204gamma' provisioning status is succeeded
VERBOSE: 8:49:08 PM - Resource Microsoft.Network/publicIPAddresses 'pip-gateway-beta' provisioning status is succeeded
VERBOSE: 8:49:08 PM - Resource Microsoft.Storage/storageAccounts 'esnodesfbdac204alpha' provisioning status is succeeded
VERBOSE: 8:49:08 PM - Resource Microsoft.Network/publicIPAddresses 'pip-alpha' provisioning status is succeeded
VERBOSE: 8:49:08 PM - Resource Microsoft.Network/publicIPAddresses 'pip-beta' provisioning status is succeeded
VERBOSE: 8:49:08 PM - Checking deployment status in 5 seconds
VERBOSE: 8:49:13 PM - Resource Microsoft.Storage/storageAccounts 'esnodesfbdac204beta' provisioning status is succeeded
VERBOSE: 8:49:13 PM - Resource Microsoft.Network/virtualNetworkGateways 'gateway-beta' provisioning status is running
VERBOSE: 8:49:13 PM - Resource Microsoft.Network/networkSecurityGroups 'nsg-beta' provisioning status is succeeded
VERBOSE: 8:49:13 PM - Resource Microsoft.Network/virtualNetworks 'vnet-gamma' provisioning status is succeeded
VERBOSE: 8:49:13 PM - Resource Microsoft.Network/virtualNetworkGateways 'gateway-alpha' provisioning status is running
VERBOSE: 8:49:13 PM - Resource Microsoft.Storage/storageAccounts 'esnodesfbdac204beta' provisioning status is succeeded
VERBOSE: 8:49:13 PM - Resource Microsoft.Network/networkSecurityGroups 'nsg-gamma' provisioning status is succeeded
VERBOSE: 8:49:13 PM - Resource Microsoft.Storage/storageAccounts 'esnodesfbdac204gamma' provisioning status is succeeded
VERBOSE: 8:49:13 PM - Resource Microsoft.Network/networkInterfaces 'nic-alpha' provisioning status is succeeded
VERBOSE: 8:49:13 PM - Resource Microsoft.Storage/storageAccounts 'esnodesfbdac204alpha' provisioning status is succeeded
VERBOSE: 8:49:13 PM - Checking deployment status in 6 seconds
VERBOSE: 8:49:19 PM - Resource Microsoft.Compute/virtualMachines 'vm-beta' provisioning status is running
VERBOSE: 8:49:19 PM - Resource Microsoft.Network/networkInterfaces 'nic-gamma' provisioning status is succeeded
VERBOSE: 8:49:19 PM - Resource Microsoft.Compute/virtualMachines 'vm-alpha' provisioning status is running
VERBOSE: 8:49:19 PM - Resource Microsoft.Network/virtualNetworkGateways 'gateway-gamma' provisioning status is running
VERBOSE: 8:49:19 PM - Resource Microsoft.Network/networkInterfaces 'nic-beta' provisioning status is succeeded
VERBOSE: 8:49:19 PM - Checking deployment status in 11 seconds
VERBOSE: 8:49:30 PM - Resource Microsoft.Compute/virtualMachines 'vm-gamma' provisioning status is running
VERBOSE: 8:49:30 PM - Checking deployment status in 11 seconds
VERBOSE: 8:49:42 PM - Checking deployment status in 11 seconds
VERBOSE: 8:49:53 PM - Resource Microsoft.Network/virtualNetworkGateways 'gateway-beta' provisioning status is succeeded
VERBOSE: 8:49:53 PM - Resource Microsoft.Network/virtualNetworkGateways 'gateway-alpha' provisioning status is succeeded
VERBOSE: 8:49:53 PM - Checking deployment status in 11 seconds
VERBOSE: 8:50:04 PM - Checking deployment status in 5 seconds
VERBOSE: 8:50:09 PM - Resource Microsoft.Network/virtualNetworkGateways 'gateway-gamma' provisioning status is succeeded
VERBOSE: 8:50:09 PM - Checking deployment status in 5 seconds
VERBOSE: 8:50:14 PM - Checking deployment status in 5 seconds
VERBOSE: 8:50:19 PM - Checking deployment status in 5 seconds
VERBOSE: 8:50:25 PM - Checking deployment status in 5 seconds
VERBOSE: 8:50:30 PM - Checking deployment status in 5 seconds
VERBOSE: 8:50:35 PM - Checking deployment status in 5 seconds
VERBOSE: 8:50:40 PM - Checking deployment status in 5 seconds
VERBOSE: 8:50:45 PM - Checking deployment status in 5 seconds
VERBOSE: 8:50:51 PM - Checking deployment status in 5 seconds
VERBOSE: 8:50:56 PM - Checking deployment status in 5 seconds
VERBOSE: 8:51:01 PM - Resource Microsoft.Compute/virtualMachines/extensions 'vm-gamma/script' provisioning status is running
VERBOSE: 8:51:01 PM - Resource Microsoft.Compute/virtualMachines 'vm-gamma' provisioning status is succeeded
VERBOSE: 8:51:01 PM - Checking deployment status in 5 seconds
VERBOSE: 8:51:06 PM - Checking deployment status in 5 seconds
VERBOSE: 8:51:11 PM - Resource Microsoft.Compute/virtualMachines 'vm-alpha' provisioning status is succeeded
VERBOSE: 8:51:11 PM - Checking deployment status in 5 seconds
VERBOSE: 8:51:16 PM - Resource Microsoft.Compute/virtualMachines/extensions 'vm-beta/script' provisioning status is running
VERBOSE: 8:51:16 PM - Resource Microsoft.Compute/virtualMachines/extensions 'vm-alpha/script' provisioning status is running
VERBOSE: 8:51:16 PM - Resource Microsoft.Compute/virtualMachines 'vm-beta' provisioning status is succeeded
VERBOSE: 8:51:16 PM - Checking deployment status in 5 seconds
VERBOSE: 8:51:22 PM - Checking deployment status in 5 seconds
VERBOSE: 8:51:27 PM - Checking deployment status in 5 seconds
VERBOSE: 8:51:32 PM - Checking deployment status in 5 seconds
VERBOSE: 8:51:37 PM - Checking deployment status in 5 seconds
VERBOSE: 8:51:43 PM - Checking deployment status in 5 seconds
VERBOSE: 8:51:48 PM - Checking deployment status in 5 seconds
VERBOSE: 8:51:53 PM - Checking deployment status in 5 seconds
VERBOSE: 8:51:58 PM - Checking deployment status in 5 seconds
VERBOSE: 8:52:03 PM - Checking deployment status in 5 seconds
VERBOSE: 8:52:08 PM - Resource Microsoft.Compute/virtualMachines/extensions 'vm-gamma/script' provisioning status is succeeded
VERBOSE: 8:52:08 PM - Checking deployment status in 5 seconds
VERBOSE: 8:52:14 PM - Checking deployment status in 5 seconds
VERBOSE: 8:52:19 PM - Checking deployment status in 5 seconds
VERBOSE: 8:52:24 PM - Checking deployment status in 5 seconds
VERBOSE: 8:52:29 PM - Checking deployment status in 5 seconds
VERBOSE: 8:52:35 PM - Checking deployment status in 5 seconds
VERBOSE: 8:52:40 PM - Checking deployment status in 5 seconds
VERBOSE: 8:52:45 PM - Resource Microsoft.Compute/virtualMachines/extensions 'vm-beta/script' provisioning status is succeeded
VERBOSE: 8:52:45 PM - Checking deployment status in 5 seconds
VERBOSE: 8:52:50 PM - Checking deployment status in 5 seconds
VERBOSE: 8:52:55 PM - Resource Microsoft.Compute/virtualMachines/extensions 'vm-alpha/script' provisioning status is succeeded


DeploymentName          : elasticsearch-secure-nodes07222017-084857
ResourceGroupName       : esnodesfbdac204
ProvisioningState       : Succeeded
Timestamp               : 7/23/2017 1:52:51 AM
Mode                    : Incremental
TemplateLink            : 
Parameters              : 
                          Name             Type                       Value     
                          ===============  =========================  ==========
                          admin-username   String                     dbetz     
                          ssh-public-key   String                     ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCxbo0LWWXCHEEGxgtraIhHBPPnt+kJGMjYMC6+9gBIsYz8R8bSFfge7ljHRxvJoye+4IrdSf2Ee2grgm2+xT9HjMvVR2/LQjPY+ocdinYHlM6miqvMgMblOMVm6/WwY0LZkozPKuSXzhO
                          +/Q6HTZBr2pig/bclvJuFPBtClrzZx5R3NfV33/2rZpFZH9OdAf28q55jbZ1t9AJhtD27s34/cRVBXNBQtc2Nw9D8cEJ+raRdJitAOX3U41bjbrO1u3CQ/JtXg/35wZTJH1Yx7zmDl97cklfiArAfaxkgpWkGhob6A6Fu7LvEgLC25gO5NsY+g4CDqGJT5kzbcyQDDhbf 
                          dbetz@localhost.localdomain
                          script-base      String                     https://filesfbdac204.blob.core.windows.net/support/elasticsearch-secure-nodes/07222017-084857
                          
Outputs                 : 
DeploymentDebugLogLevel : 

~4 minutes total to setup 3 VMs is pretty good. Keep in mind that this time frame includes running the post-VM-creation script (install.sh). I loaded that with a bunch of stuff. You can see this part of the deploy in lines like the following:

Resource Microsoft.Compute/virtualMachines/extensions 'vm-beta/script' provisioning status is running

and

Resource Microsoft.Compute/virtualMachines/extensions 'vm-gamma/script' provisioning status is succeeded

Why's it so fast? Two reasons: First, the storage accounts, VNets, IPs, NICs, and NSGs are already setup in phase 1. Second, Azure will parallel deploy whatever it can. Upon validating that the dependencies (dependsOn) are already setup, Azure will deploy the VMs. This means that phase 3 is a parallel deployment of three VMs.

Inspection

At this point the entire core infrastructure is in place, including VMs. We can verify this by looking at the Elasticsearch endpoint.

While we can easily derive the endpoint address, let's let Powershell tell us directly:


    (Get-AzureRmPublicIpAddress -ResourceGroupName $rgGlobal -Name "pip-alpha").DnsSettings.Fqdn 
    (Get-AzureRmPublicIpAddress -ResourceGroupName $rgGlobal -Name "pip-beta").DnsSettings.Fqdn 
    (Get-AzureRmPublicIpAddress -ResourceGroupName $rgGlobal -Name "pip-gamma").DnsSettings.Fqdn 

From this we have the following:

esnodesfbdac204-alpha.centralus.cloudapp.azure.com
esnodesfbdac204-beta.westus.cloudapp.azure.com
esnodesfbdac204-gamma.eastus.cloudapp.azure.com

With this we can access the following endpoints to see pairing. You only need to do this on one, but because these are public, let's look at all three:

http://esnodesfbdac204-alpha.centralus.cloudapp.azure.com:9200
http://esnodesfbdac204-beta.westus.cloudapp.azure.com:9200
http://esnodesfbdac204-gamma.eastus.cloudapp.azure.com:9200

Always use SSL. Consider HTTP deprecated.

nodes

Let's see if we have any indices.

w/o data

Nope. That makes sense because... well... I didn't create any yet.

Logging into alpha

We want to generate some data for Elasticsearch. I've provided a generation tool which the VMs setup during their provisioning.

Before we get to that point, we have to login to a VM.

Choose a VM DNS name and try to ssh to it. I don't care which one. I'm going with alpha.

[dbetz@core ~]$ ssh esnodesfbdac204-alpha.westus.cloudapp.azure.com
The authenticity of host 'esnodesfbdac204-alpha.centralus.cloudapp.azure.com (52.165.135.82)' can't be established.
ECDSA key fingerprint is 36:d7:fd:ab:39:b1:10:c2:88:9f:7a:87:30:15:8f:e6.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'esnodesfbdac204-alpha.centralus.cloudapp.azure.com,52.165.135.82' (ECDSA) to the list of known hosts.
[dbetz@alpha ~]$

Troubleshooting

If you get a message like the following, then you don't have the private key that goes with the public key you gave the VM in the ARM template.

[dbetz@core ~]$ ssh esnodesfbdac204-alpha.westus.cloudapp.azure.com
The authenticity of host 'esnodesfbdac204-alpha.westus.cloudapp.azure.com (13.87.182.255)' can't be established.
ECDSA key fingerprint is 94:dd:1b:ca:bf:7a:fd:99:c2:70:02:f3:0c:fa:0b:9a.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'esnodesfbdac204-alpha.westus.cloudapp.azure.com,13.87.182.255' (ECDSA) to the list of known hosts.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

To get your private key you can dump it out:

[dbetz@core ~]$ cat ~/.ssh/id_rsa
-----BEGIN RSA PRIVATE KEY-----
...base64 stuff here...
-----END RSA PRIVATE KEY-----

You can take that and dump it into a different system:

[dbetz@core ~]$ export HISTCONTROL=ignorespace
[dbetz@core ~]$         cat > ~/.ssh/id_rsa <<\EOF
-----BEGIN RSA PRIVATE KEY-----
...base64 stuff here...
-----END RSA PRIVATE KEY-----
EOF

When HISTCONTROL is set to ignorespace is set and a command has a space in front of it, it won't be stored in shell history.

When you try it again, you'll get a sudden urge to throw your chair across the room:

[dbetz@core ~]$ ssh esnodesfbdac204-alpha.eastus.cloudapp.azure.com       
The authenticity of host 'esnodesfbdac204-alpha.eastus.cloudapp.azure.com (13.87.182.255)' can't be established.
ECDSA key fingerprint is 94:dd:1b:ca:bf:7a:fd:99:c2:70:02:f3:0c:fa:0b:9a.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'esnodesfbdac204-alpha.eastus.cloudapp.azure.com,13.87.182.255' (ECDSA) to the list of known hosts.
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@         WARNING: UNPROTECTED PRIVATE KEY FILE!          @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions 0664 for '/home/dbetz/.ssh/id_rsa' are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
bad permissions: ignore key: /home/dbetz/.ssh/id_rsa
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

Chill. Your permissions suck. File permissions are set via your umask settings. In this case, they're too open. Only you need permissions, read-only permissions at that: 400.

You just need to drop the permissions:

[dbetz@core ~]$ chmod 400 ~/.ssh/id_rsa 

Now you can get in:

[dbetz@core ~]$ ssh esnodesfbdac204-alpha.eastus.cloudapp.azure.com  
The authenticity of host 'linux04.jampad.net (192.80.189.178)' can't be established.
ECDSA key fingerprint is 7a:24:38:8c:05:c1:2c:f3:d0:fa:52:0d:2c:a4:04:9c.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'linux04.jampad.net,192.80.189.178' (ECDSA) to the list of known hosts.
Last login: Sat Jul 22 17:39:07 2017 from 136.61.130.214
[dbetz@alpha ~]$ 

We're in.

Adding data

To generate sample data for this scenario run my data generation tool (based on my project Hamlet.

It's in the /home/root folder.

[root@alpha ~]# ./setup_data_generation.sh

Azure does its automated setup as root. That's why were here.

Running this installs all the required tool and writes instructions.

Follow the instructions the tool provides :

[root@alpha ~]# cd /srv/hamlet
[root@alpha hamlet]# source bin/activate
(hamlet) [root@alpha hamlet]# cd content
(hamlet) [root@alpha content]# python /srv/hamlet/content/hamlet.py
http://10.17.0.4:9200/librarygen
^CStopped (5.8595204026280285)

I let it run for a few seconds, then hit CTRL-C to exit.

Now let's refresh our Elasticsearch endpoints (the :9200 endpoints).

with data

The data is there and has replicated across all servers.

Looking at all three systems at once is just for the purpose of this demo. In reality, all you have to do is look at /_cat/shards on any node ins the cluster and be done with it:

shards

You can even see that there are multiple shards and replicas (p => primary, r => replica).

Create Traffic Manager

At this point we want to create a single point-of-contact for search. We do this with traffic manager. You create the traffic manager then add an endpoint for each system:

    function createtrafficmanager { param([Parameter(Mandatory=$true)]$rg,
                                          [Parameter(Mandatory=$true)]$count)
        clear
            
        $names = @("alpha", "beta", "gamma", "delta", "epsilon")
    
        $uniqueName = (Get-AzureRmStorageAccount -ResourceGroupName $rg)[0].StorageAccountName
    
        $tmProfile = New-AzureRmTrafficManagerProfile -ResourceGroupName $rg -name "tm-$rg" `
                        -TrafficRoutingMethod Performance `
                        -ProfileStatus Enabled `
                        -RelativeDnsName $uniqueName `
                        -Ttl 30 `
                        -MonitorProtocol HTTP `
                        -MonitorPort 9200 `
                        -MonitorPath "/"
    
        (1..$count).foreach({
            $name = $names[$_ - 1]
            $pip = Get-AzureRmPublicIpAddress -ResourceGroupName $rg -Name "pip-$name"
            Add-AzureRmTrafficManagerEndpointConfig -TrafficManagerProfile $tmProfile -EndpointName $name -TargetResourceId $pip.id -Type AzureEndpoints -EndpointStatus Enabled
        })
        Set-AzureRmTrafficManagerProfile -TrafficManagerProfile $tmProfile
        
    }
    
    createtrafficmanager -rg 'esnodes4dede7b0' -count 3

Output

Id                               : /subscriptions/20e08d3d-d5c5-4f76-a454-4a1b216166c6/resourceGroups/esnodes4dede7b0/providers/Microsoft.Network/trafficManagerProfiles/tm-esnodes4dede7b0
Name                             : tm-esnodes4dede7b0
ResourceGroupName                : esnodes4dede7b0
RelativeDnsName                  : esnodes4dede7b0alpha
Ttl                              : 30
ProfileStatus                    : Enabled
TrafficRoutingMethod             : Performance
MonitorProtocol                  : HTTP
MonitorPort                      : 9200
MonitorPath                      : /
MonitorIntervalInSeconds         : 30
MonitorTimeoutInSeconds          : 10
MonitorToleratedNumberOfFailures : 3
Endpoints                        : {alpha, beta, gamma}

Id                               : /subscriptions/20e08d3d-d5c5-4f76-a454-4a1b216166c6/resourceGroups/esnodes4dede7b0/providers/Microsoft.Network/trafficManagerProfiles/tm-esnodes4dede7b0
Name                             : tm-esnodes4dede7b0
ResourceGroupName                : esnodes4dede7b0
RelativeDnsName                  : esnodes4dede7b0alpha
Ttl                              : 30
ProfileStatus                    : Enabled
TrafficRoutingMethod             : Performance
MonitorProtocol                  : HTTP
MonitorPort                      : 9200
MonitorPath                      : /
MonitorIntervalInSeconds         : 30
MonitorTimeoutInSeconds          : 10
MonitorToleratedNumberOfFailures : 3
Endpoints                        : {alpha, beta, gamma}

Id                               : /subscriptions/20e08d3d-d5c5-4f76-a454-4a1b216166c6/resourceGroups/esnodes4dede7b0/providers/Microsoft.Network/trafficManagerProfiles/tm-esnodes4dede7b0
Name                             : tm-esnodes4dede7b0
ResourceGroupName                : esnodes4dede7b0
RelativeDnsName                  : esnodes4dede7b0alpha
Ttl                              : 30
ProfileStatus                    : Enabled
TrafficRoutingMethod             : Performance
MonitorProtocol                  : HTTP
MonitorPort                      : 9200
MonitorPath                      : /
MonitorIntervalInSeconds         : 30
MonitorTimeoutInSeconds          : 10
MonitorToleratedNumberOfFailures : 3
Endpoints                        : {alpha, beta, gamma}

Id                               : /subscriptions/20e08d3d-d5c5-4f76-a454-4a1b216166c6/resourceGroups/esnodes4dede7b0/providers/Microsoft.Network/trafficManagerProfiles/tm-esnodes4dede7b0
Name                             : tm-esnodes4dede7b0
ResourceGroupName                : esnodes4dede7b0
RelativeDnsName                  : esnodes4dede7b0alpha
Ttl                              : 30
ProfileStatus                    : Enabled
TrafficRoutingMethod             : Performance
MonitorProtocol                  : HTTP
MonitorPort                      : 9200
MonitorPath                      : /
MonitorIntervalInSeconds         : 30
MonitorTimeoutInSeconds          : 10
MonitorToleratedNumberOfFailures : 3
Endpoints                        : {alpha, beta, gamma}

Just access your endpoint:

http://esnodes4dede7b0.trafficmanager.net:9200

shards

Now you have a central endpoint. In this case traffic will be send to whichever Elasticsearch endpooint is closest to the end user (=performance traffic routing method).

You'll want SSL for that. Well, after you go through the following DNS section.

Adding DNS

At this point everything is functional. Let's go beyond bare functionality. Normally, you'd want something like search.davidbetz.net as an endpoint.

The following will give my davidbetz.net domain a subdomain: esnodes4dede7b0.davidbetz.net.


    function creatednscname { param([Parameter(Mandatory=$true)]$dnsrg,
                                          [Parameter(Mandatory=$true)]$zonename,
                                          [Parameter(Mandatory=$true)]$cname,
                                          [Parameter(Mandatory=$true)]$target)
    
        New-AzureRmDnsRecordSet -ResourceGroupName $dnsrg -ZoneName $zonename -RecordType CNAME -Name $cname -Ttl 3600 -DnsRecords (
            New-AzureRmDnsRecordConfig -Cname $target
        )
    }
    
    function _virtualenv {
    
    $dnsrg = 'davidbetz01'
    $zone = 'davidbetz.net'
    
    creatednscname $dnsrg $zone $rgGlobal "$rgGlobal.trafficmanager.net"
    
    } _virtualenv

Done.

shards

Addition Thoughts

Remember, SSL. You do this with SSL termination. Putting SSL between each system internally (between nginx and an internal web server) is naive and foolish; you only need SSL to external systems. You do this with Nginx. See my secure Elasticsearch lab at https://linux.azure.david.betz.space/_/elasticsearch-secure for details.

You'll also want to protect various Elasticsearch operations via password (or whatever). See my Running with Nginx for more information.

You can learn more about interacting with Elasticsearch directly via my article Learning Elasticsearch with PowerShell.

Running with Nginx

Stacks do not exist. As soon as you change your database you're no longer LAMP or MEAN. Drop the term. Even then, the term only applies to an application component; it doesn't describe you. If you are a "Windows guy", learn Linux. If you are a "LAMP" guy, you have to at least have some clue about Windows. Don't marry yourself to only AWS or Azure. Learn both. Use both. They're tools. Only a fool keeps deliberately limits his toolbox.

No matter your interests, you really should learn Nginx.

So, what is it? The older marketing says it's a "reverse proxy". In reality, the purposes and use-cases for Nginx have changes over the years as other technologies have grown. Today, it's a tool to decouple application TCP connectivity. You put it places where you want to decouple the outside from the inside. A load balancer does this decoupling by sitting between incoming connections and a series of servers. A web server does this by sitting between an HTTP call and internal web content. A TLS terminator does this by sitting between everything external and unencrypted internal resources. If Nginx isn't a fit your a specific scenario, it's likely a perfect fit for another in the same infrastructure.

In older web hosting models, your web server handles BOTH the processing of the content AND the HTTP serving. It does too much. As much as IIS7 is an improvement over IIS6 (no more ISAPI), it still suffers from this. It's both running .NET and serving the content. The current web server model handles this differently: UWSGI runs Python, PM2 runs Node, and Kestrel runs .NET Core. Nginx handles the HTTP traffic and deals with all the TLS certs.

The days of having to deal with IIS and Apache are largely over. Python, Node, and .NET Core each know how to run their own code and Nginx knows TCP. The concepts have always been separate, now the processes are separate.

Let's run through some use cases and design patterns...

Adding Authentication

I'm going to start off with a classic example: adding username / password authentication to an existing web API.

Elasticsearch is one of my favorite database systems; yet, it doesn't have native support for SSL (discussing SSL later) or authorization. There's a tool called Shield for that, but it's over kill when I don't care about multiple users. Nginx came to the rescue. Below is a basic Nginx config. You should be able to look at the following config to get an idea of what's going on.

server {
    listen 10.1.60.3;

    auth_basic "ElasticSearch";
    auth_basic_user_file /etc/nginx/es-password;

    location / {
        proxy_pass http://127.0.0.1:9200;
        proxy_http_version 1.1;
        proxy_set_header Connection "Keep-Alive";
        proxy_set_header Proxy-Connection "Keep-Alive";
    }
}

I'm passing all traffic on to port 9200. This port is only bound locally, so HTTP isn't even publicly accessible. You can also see I'm setting some optional headers.

The es-password file was created using the Apache tool htaccess:

sudo htpasswd -c /srv/es-htpasswd searchuser

SSL/TLS Redirect

Let's lighten up a bit with a simpler example...

There are myriad ways to redirect from HTTP to HTTPS. Nginx is my new favorite way:

server {
    listen 222.222.222.222:80;

    server_name example.net;
    server_name www.example.net;

    return 301 https://example.net$request_uri;
}

Avoid the temptation of keeping a separate SSL config like Apache did. Your Nginx configurations should be by domain, not by function. Your configuration file will be example.com.conf. It will house all the configuration for example.com. It will have the core functionality and the SSL/TLS redirect.

Accessing localhost only services

There was a time when I needed to download some files from my Google Drive to a Linux Server. rclone seemed to be an OK way to do that. During setup, it wanted me to go through the OpenID/OAuth stuff to give it access. Good stuff, but...

If your browser doesn't open automatically go to the following link: http://127.0.0.1:53682/auth
Log in and authorize rclone for access
Waiting for code...
Got code

127.0.0.1?!? That's a remote server! Using Lynx browser over ssh wasn't going to cut it. Then I realized the answer: Nginx.

Here's what I did:

server {
    listen 111.111.111.111:8080;

    location / {
        proxy_pass http://127.0.0.1:53682;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header Host 127.0.0.1;
    }
}

Then I could access the server in my own browser using http://111.111.111.111:53682/auth.

Boom! I got the Google authorization screen right away and everything came together.

Making services local only

This brings up an intersting point: what if you had a public service you didn't want to be public, but didn't have a way to secure it -- or, perhaps, you just wanted to change the port?

In a situation where I had to cheat, I'd cheat by telling iptables (Linux firewall) to block that port, then use Nginx to open the new one.

For example:

iptables -A INPUT -i lo -j ACCEPT
iptables -A INPUT -p tcp --dport 8080 -j ACCEPT
iptables -A INPUT -j DROP

This says: allow localhost and stuff to port 8080, but block everything else.

If you do this, you need to save the rules using something like iptables-save > /etc/iptables/rules.v4. On Ubuntu, you can get this via apt-get install iptables-persistent.

Then, you can do something like the previous Nginx example to take traffic from a different port.

Better yet, use firewall-d. Using iptables directly is as old and obsolete as Apache.

TCP/UDP Connectivity

The examples here are just snippets. They actually go inside something like a block like the following:

http {
    upstream myservers {
        ip_hash;
        server server1.example.com;
        server server2.example.com;
        server server3.example.com;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://myservers;
        }
    }
}

What about just UDP/TCP? Nginx isn't merely for HTTP. You can and should use it for raw UDP/TCP too. You just wrap it differently:

stream {
    upstream myservers {
        server server1.example.com:8452;
        server server2.example.com:8452;
        server server3.example.com:8452;
    }

    server {
        listen 8452;

        proxy_pass myservers;
    }
}

No http:// and no location. Now you're load balancing TCP connections without needing to be HTTP aware.

How about UDP?

listen 8452 udp;

Done.

File Serving

These days you want to keep your static content on a CDN. Serving static files can be expensive. Where possible, avoid this.

Some clear cases where you'd serve static files are robots.txt and favicon.ico. In your existing Nginx config, you'd add the following...

location /robots.txt {
    alias /srv/robots.txt;
}

location /favicon.ico {
    alias /srv/netfx/netfxdjango/static/favicon.ico;
}

For a pure SPA application, you'd throw index.html in there as well. The SPA assets would load from the CDN. At this point your server would only serve 3 files.

Inline Content

If you don't want to deal with a separate file, you can just return the content directly:

location /robots.txt {
    add_header Content-Type text/plain always;
    return 200 'User-agent: ia_archiver

Disallow: /'; }

No big deal.

Setting the Host Header (Azure App Example)

So, you've got a free/shared Azure Web App. You've got you free hosting, free subdomain, and even free SSL. Now you want your own domain and your own SSL. What do you do? Throw money at it? Uh... no. Well, assuming you were proactive and keep a Linux server around.

This is actually a true story of how I run some of my websites. You only get so much free bandwidth and computing with the free Azure Web Apps, so you have to be careful. The trick to being careful is Varnish.

The marketing for Varnish says it's a caching server. As with all marketing, they're trying to make something sound less cool than it really it (though that's never their goal). Varnish can be a load-balancer or something to handle fail-over as well. In this case, yeah, it's a caching server.

Basically: I tell Varnish to listen to port 8080 on localhost. It will take traffic and provide responses. If it needs something, it will go back to the source server to get the content. Most hits to the server will be handled with Varnish. Azure breathe easy.

Because the Varnish config is rather verbose and because it's only tangentially related to this topic, I really don't want to dump a huge Varnish config here. So, I'll give snippets:

backend mydomain {
    .host = "mydomain.azurewebsites.net";
    .port = "80";
    .probe = {
         .interval = 300s;
         .timeout = 60 s;
         .window = 5;
         .threshold = 3;
    }
  .connect_timeout = 50s;
  .first_byte_timeout = 100s;
  .between_bytes_timeout = 100s;
}

sub vcl_recv {
    #++ more here
    if (req.http.host == "123.123.123.123" || req.http.host == "www.example.net" || req.http.host == "example.net") {
        set req.http.host = "mydomain.azurewebsites.net";
        set req.backend = mydomain;
        return (lookup);
    }
    #++ more here
}

This won't make much sense without the Nginx piece:

server {
        listen 123.123.123.123:443 ssl;

        server_name example.net;
        server_name www.example.net;
        ssl_certificate /srv/cert/example.net.crt;
        ssl_certificate_key /srv/cert/example.net.key;

        location / {
            proxy_pass http://127.0.0.1:8080;
            proxy_set_header X-Real-IP  $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto https;
            proxy_set_header X-Forwarded-Port 443;
            proxy_set_header Host mydomain.azurewebsites.net;
        }
}

Here's what to look for in this:

proxy_set_header Host mydomain.azurewebsites.net;

Nginx sets up a listener for SSL on the public IP. It will send requests to localhost:8080.

On the way, it will make sure the Host header says "mydomain.azurewebsites.net". This does two things:

* First, Varnish will be able to detect that and send it to the proper backend configuration (above it).

* Second, Azure will give you a website based on the `Host` header. That needs to be right. That one line is the difference between getting your correct website or getting the standard Azure website template.

In this example, Varnish is checking the host because Varnish is handling multiple IP addresses, multiple hosts, and caching for multiple Azure websites. If you have only one, then these Varnish checks are superfluous.

A lot of systems rely on the Host header. Because raw HTTP is largely deprecated, you're going to be using SSL/TLS everywhere. You need to make sure your server's name matches the Host header. You'll see proxy_set_header Host SOMETHING a lot.

Load-balancing

A very common use case for Nginx is as a load-balancer.

For each instance of load-balancing, you need to examine your scenario to see if Nginx, HAProxy, your cloud's load balancer, or another product is called for. Some intelligent load-balancing features are only available with Nginx Plus.

Nginx load balancing is simple:

upstream myservers {
    server server1.example.com;
    server server2.example.com;
    server server3.example.com;
}

server {
    listen 80;

    location / {
        proxy_pass http://myservers;
    }
}

Of course, that's a bit naive. If you have systems where a connection must always return to the same backend system, you need to set some type of session persistence. Also stupid simple:

upstream myservers {
    ip_hash;
    server server1.example.com;
    server server2.example.com;
    server server3.example.com;
}

There are other modes than ip_hash, but those are in the docs.

Sometimes some systems are more powerful than others, thus handle more traffic than others. Just set weights:

upstream myservers {
    server server1.example.com weight=4;
    server server2.example.com;
    server server3.example.com;
}

There's not a lot to it.

What if you wanted to send traffic to whatever server had the least number of connections?

upstream myservers {
    least_conn;
    server server1.example.com;
    server server2.example.com;
    server server3.example.com;
}

You can't really brag about your ability to do this.

Failover

Similar to a load-balancer is a server used for load-balancing. As I've said, Nginx is about decoupling application TCP connectivity. This is yet another instance of that.

Say you had an active/passive configuration where traffic goes to server A, but you want server B used when server A is down. Easy:

upstream myservers {
    server a.example.com fail_timeout=5s max_fails=3;
    server b.example.com backup;
}

Done.

Verb Filter

Back to Elasticsearch...

It uses various HTTP verbs to get the job done. You can POST, PUT, and to insert, update, or delete respectively, or you can use GET to do your searches. How about a security model where I only allow searches?

Here's a poorman's method that works:

server {
    listen 222.222.222.222:80;

    location / {
        limit_except GET {
            deny all;
        }
        proxy_pass http://127.0.0.1:9200;
        proxy_http_version 1.1;
        proxy_set_header Connection "Keep-Alive";
        proxy_set_header Proxy-Connection "Keep-Alive";
    }
}

Verb Filter (advanced)

When using Elasticsearch, you have the option of accessing your data directly without the need for a server-side anything. In fact, your AngularJS (or whatever) applications can get data directly from ES. How? It's just an HTTP endpoint.

But, what about updating data? Surely you need some type of .NET/Python bridge to handle security, right? Nah.

Checkout the following location blocks:

location ~ /_count {
    proxy_pass http://elastic;
    proxy_http_version 1.1;
    proxy_set_header Connection "Keep-Alive";
    proxy_set_header Proxy-Connection "Keep-Alive";
}

location ~ /_search {
    proxy_pass http://elastic;
    proxy_http_version 1.1;
    proxy_set_header Connection "Keep-Alive";
    proxy_set_header Proxy-Connection "Keep-Alive";
}

location ~ /_ {
    limit_except OPTIONS {
        auth_basic "Restricted Access";
        auth_basic_user_file /srv/es-password;
    }

    proxy_pass http://elastic;
    proxy_http_version 1.1;
    proxy_set_header Connection "Keep-Alive";
    proxy_set_header Proxy-Connection "Keep-Alive";
}

location / {
    limit_except GET HEAD {
        auth_basic "Restricted Access";
        auth_basic_user_file /srv/es-password;
    }

    proxy_pass http://elastic;
    proxy_http_version 1.1;
    proxy_set_header Connection "Keep-Alive";
    proxy_set_header Proxy-Connection "Keep-Alive";
}

Here I'm saying: you can access anything with _count (this is how you get counts from ES), and anything with _search (this is how you query), but if you are accessing something else containing an underscore, you need to provide creds (unless it's an OPTION, which allows CORS to work). Finally, if you're accessing / directly, you can send GET and HEAD, but you need creds to do a POST, PUT, or DELETE.

You can add credential handling to your AngularJS/JavaScript application by sending creds via https://username:password@example.net.

Domain Unification

In the previous example, we have an Elasticsearch service. What about our website? Do we really want to deal with both domain.com and search.domain.com, and the resulting CORS nonsense? Do really REALLY want to deal with multiple SSL certs?

No, we don't.

In this case, you can use Nginx to unify your infrastructure to use one domain.

Let's just update the / in the previous example:

location / {
    limit_except GET HEAD {
        auth_basic "Restricted Access";
        auth_basic_user_file /srv/es-password;
    }

    proxy_pass http://myotherwebendpoint;
    proxy_http_version 1.1;
    proxy_set_header Connection "Keep-Alive";
    proxy_set_header Proxy-Connection "Keep-Alive";
}

Now / uses gets its content from a different place than the other servers.

Let's really bring it home:

location /api {
    proxy_pass http://myserviceendpoint;
    proxy_http_version 1.1;
    proxy_set_header Connection "Keep-Alive";
    proxy_set_header Proxy-Connection "Keep-Alive";
}

Now /api points to your API service.

Now you only have to deal with domain.com while having three different services / servers internally.

Unification with Relative Paths

Building on the previous example, what if you wanted to unify Elasticsearch with your domain?

This seems simple enough, but it's not the following:

location /search {
    proxy_pass http://127.0.0.1:9200;
}

This would send traffic to http://127.0.0.1:9200/search whereas Elasticsearch listens on http://127.0.0.1:9200.

You want to map your /search to its /.

That's doable:

location ~ ^/search {
    rewrite ^\/search\/?(.*) /\$1 break;
    proxy_pass http://127.0.0.1:9200/$1;
}

This says: all that starts with /search goes to / because the pattern "starts with /search with an optional trailing slash and optional trailing characters" gets replaced with "slash with those optional trailing characters".

For notes on break, see nginx url rewriting: difference between break and last.

General Rewriting

You can use the idea in previous example for local rewriting as well. This isn't just about mapping logical files to physical, this also effectively gives you aliases of aliases:

rewrite ^/robots.txt /static/robots.txt;

location /static {
    alias /srv/common; 
}

This lets you access robots.txt can be accessed via /robots.txt as well as /static/robots.txt:

Killing 1990s "www."

Nobody types "www.", it's never on business cards, nobody says it, and most people forgot it exists. Why? This isn't 1997. The most important part of getting a pretty URL is removing this nonsense. Nginx to the rescue:

server {
    listen 222.222.222.222:80

    server_name example.net;
    server_name www.example.net;

    return 301 https://example.net$request_uri;
}

server {
    listen 222.222.222.222:443 ssl http2;

    server_name www.example.net;

    # ... ssl stuff here ...

    return 301 https://example.net$request_uri;
}

server {
    listen 222.222.222.222:443 ssl http2;

    server_name example.net;

    # ... handle here ...
}

All three server blocks listen on the same IP, but the first listens on port 80 to redirect to the actual domain (there's no such thing as a "naked domain"-- it's just the domain; "www." is a subdomain), the second listens for the "www." subdomain on the HTTPS port (in this case using HTTP2), and the third is where everyone is being directed.

SSL/TLS Termination

This example simply expands the previous one by showing the SSL and HTTP2 implemenation.

Your application will not likely have SSL/TLS on every node. That's not something people do. If you have a requirement to secure communication between nodes, you're likely going to do it at a much lower level with IPSec.

At the application level, most people will use SSL/TLS termination: you add SSL/TLS termination to ingress point of your application domain. This is one of the things you see in application gateways, for example. The application might be an army of systems and web APIs that talk to each other across multiple systems (or within the same system), but are exposed externally via an application gateway that provides SSL termination. This gateway / terminator is usually Nginx.

Think about this in the context of some of the other use cases. When you merge this use case with the load-balancing one, you've optimized your infrastructure so the backend servers don't need the SSL/TLS. Then there's the Varnish example... Varnish does not support SSL/TLS. They force you to use SSL/TLS termination.

server {
    listen 222.222.222.222:80;

    server_name example.net;
    server_name www.example.net;

    return 301 https://example.net$request_uri;
}

server {
    listen 222.222.222.222:443 ssl http2;

    server_name www.example.net;

    ssl_certificate /_cert/example.net.chained.crt;
    ssl_certificate_key /srv/_cert/example.net.key;

    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;

    ssl_ciphers EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH:ECDHE-RSA-AES128-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA128:DHE-RSA-AES128-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES128-GCM-SHA128:ECDHE-RSA-AES128-SHA384:ECDHE-RSA-AES128-SHA128:ECDHE-RSA-AES128-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES128-SHA128:DHE-RSA-AES128-SHA128:DHE-RSA-AES128-SHA:DHE-RSA-AES128-SHA:AES128-GCM-SHA384:AES128-GCM-SHA128:AES128-SHA128:AES128-SHA128:AES128-SHA:AES128-SHA:HIGH:!aNULL:!eNULL:!EXPORT:!DES:!MD5:!PSK:!RC4;

    ssl_prefer_server_ciphers on;

    ssl_dhparam /srv/_cert/dhparam.pem;

    return 301 https://example.net$request_uri;
}

server {
    listen 222.222.222.222:443 ssl http2;

    server_name example.net;

    ssl_certificate /srv/_cert/example.net.chained.crt;
    ssl_certificate_key /srv/_cert/example.net.key;

    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;

    ssl_ciphers EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH:ECDHE-RSA-AES128-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA128:DHE-RSA-AES128-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES128-GCM-SHA128:ECDHE-RSA-AES128-SHA384:ECDHE-RSA-AES128-SHA128:ECDHE-RSA-AES128-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES128-SHA128:DHE-RSA-AES128-SHA128:DHE-RSA-AES128-SHA:DHE-RSA-AES128-SHA:AES128-GCM-SHA384:AES128-GCM-SHA128:AES128-SHA128:AES128-SHA128:AES128-SHA:AES128-SHA:HIGH:!aNULL:!eNULL:!EXPORT:!DES:!MD5:!PSK:!RC4;

    ssl_prefer_server_ciphers on;

    ssl_dhparam /srv/_cert/dhparam.pem;

    location / {
        add_header Strict-Transport-Security max-age=15552000;
        add_header Content-Security-Policy "default-src 'none'; font-src fonts.gstatic.com; frame-src accounts.google.com apis.google.com platform.twitter.com; img-src syndication.twitter.com bible.logos.com www.google-analytics.com 'self'; script-src api.reftagger.com apis.google.com platform.twitter.com 'self' 'unsafe-eval' 'unsafe-inline' www.google.com www.google-analytics.com; style-src fonts.googleapis.com 'self' 'unsafe-inline' www.google.com ajax.googleapis.com; connect-src search.jampad.net jampadcdn.blob.core.windows.net example.net";

        include         uwsgi_params;
        uwsgi_pass      unix:///srv/example.net/mydomaindjango/content.sock;

        proxy_set_header X-Real-IP  $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto https;
        proxy_set_header X-Forwarded-Port 443;
        proxy_set_header Host example.net;
    }
}

Request Redaction

Sometimes when an application makes a call it sets the wrong Content-Type. When that's wrong, the results can be unpredictable. So, you may need to remove it:

proxy_set_header Content-Type "";

You can also use this to enforce certain data access. You might be sick and tired of people constantly using the wrong Content-Type. Just override it. Another situation is when a header contains authorization information that you'd like to entirely ignore. Just strip it off:

proxy_set_header Authorization "";

That says: I don't care if you have a token. You have no power here. You're anonymous now.

That brings up another point: you can use this to redact server calls.

Decorate Requests

In addition to removing security, you can add it:

location /api {
    proxy_pass http://127.0.0.1:3000;
    proxy_set_header X-AccessRoles "Admin";
    proxy_http_version 1.1;
    proxy_set_header Connection "Keep-Alive";
    proxy_set_header Proxy-Connection "Keep-Alive";
    proxy_set_header X-Real-IP  $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_set_header X-Forwarded-Port 3000;
    proxy_set_header Host $host;
}

Decorate Responses

If you're doing local development without a server (e.g. SPA development), perhaps you still want to make sure your calls have a Cookie for testing.

location ^~ / {
    proxy_pass http://127.0.0.1:4200;
    add_header Set-Cookie SESSION=Zm9sbG93LW1lLW9uLXR3aXR0ZXItQG5ldGZ4aGFybW9uaWNz;
    proxy_set_header X-Real-IP  $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_set_header X-Forwarded-Port 4200;
    proxy_set_header Host $host;
}

This will send a Cookie back. If called with a web browser, it will automagically be added to future calls by the browser.

Config Standardization

The previous example had a lot of security setup. It had TLS, CSP, and HSTS. Instead of trying to figure out how to setup TLS on each of your applications, just let Nginx handle it. Your 10,000 servers are now easy to setup. You're welcome.

Some applications are also a pain to setup for a specific IP address. You want 10.2.1.12, not 10.4.3.11, but your application has a horrible Java UI from 2005 that never runs right. Just have it listen on 127.0.0.1 and let Nginx listen on the external IP.

Verb Routing

Speaking of verbs, you could whip out a pretty cool CQRS infrastructure by splitting GET from POST.

This is more of a play-along than a visual-aide. You can actually try this one at home.

Here's a demo using a quick node server:


http = require('http')
port = parseInt(process.argv[2])
server = http.createServer(function(req, res) {
    res.writeHead(200, {'Content-Type': 'text/html'})
    res.end(req.method + ' server ' + port)
})
host = '127.0.0.1'
server.listen(port, host)


Here's our nginx config:

server {
    listen 222.222.222.222:8192;

    location / {
        limit_except GET {
            proxy_pass http://127.0.0.1:6001;
        }
        proxy_pass http://127.0.0.1:6002;
    }
}

use nginx -s reload to quickly reload config without doing a full service restart

Now, to spin up two of them:

node server.js 6001 &
node server.js 6002 &

& runs something as a background process

Now to call them (PowerShell and curl examples provided)...

(wget -method Post http://192.157.251.122:8192/).content

curl -XPOST http://192.157.251.122:8192/

Output:

POST server 6001
(wget -method Get http://192.157.251.122:8192/).content

curl -XGET http://192.157.251.122:8192/

Output:

GET server 6002

Cancel background tasks with fg then CTRL-C. Do this twice to kill both servers.

There we go, your inserts go to one location you read from a different one.

Development Environments

Another great thing about Nginx is that it's not Apache ("a patchy" web server, as the joke goes). Aside from Apache simply trying to do far too much, it's an obsolete product from the 90s that needs to be dropped. It's also often very hard to setup. The security permissions in Apache, for example, make no sense and the documentation is horrible.

Setting up Apache is a dev environment almost never happens, but Nginx is seamless enough for it not to interfere with day-to-day development.

The point: don't be afraid to use Nginx in your development setup.

Linux Sockets

When using Python you serve content with the WSGI: web software gateway inferface. It's literally a single function signature that enables web access. You run your application with something that executes WSGI content. One popular option is the tool called UWSGI. With this you can expose your Python web application as a Linux socket. Nginx will listen on HTTP and bridge the gap for you.

The single interface function signature is as follows (with an example):

def name_does_not_matter(environment, response_code):
    response_code = '200 OK'
    return 'Your content type was {}'.format(environment['CONTENT_TYPE'])

This is even what Django does deep down.

Here's the Nginx server config:

server {
    listen 222.222.222.222:80;

    location / {
        include            uwsgi_params;
        uwsgi_pass         unix:/srv/raw_python_awesomeness/content/content.sock;

        proxy_redirect     off;
        proxy_set_header   Host $host;
        proxy_set_header   X-Real-IP $remote_addr;
        proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header   X-Forwarded-Host $server_name;
    }
}

You can see UWSGI in action in my WebAPI for Python project at https://github.com/davidbetz/pywebapi.

Bulk Download in Linux

Want to download a huge list of files on Linux? No problem...

Let's get a sample file list:

wget http://www.linuxfromscratch.org/lfs/view/stable/wget-list

Now, let's download them all:

sed 's/^/wget /e' wget-list

This says: execute wget for each line

Done.

Learning Elasticsearch with PowerShell

Reframing Elasticsearch

Before I talk about any topic, I like to reframe it away from the marketing, lame "Hello World" examples, and your own personal echo chamber. So, I'm going to begin my talking about what Elasticsearch ("ES") is. I do not consider it to be a "search engine", so... pay attention.

I'm not big on marketing introductions. They are usually filled with non-technical pseudo-truths and gibberish worthy of the "As seen on TV" warning label. So, what is Elasticsearch? Marketing says it's a search system. People who have used it say it's a hardcore Aristotelian database that makes for a fine primary datastore as well as for a fine search engine.

One of the major differences with MongoDB is that Elastic is more explicit about its indexing. Every database does indexing and everything has a schema. Have a data-structure? You have an index. Have fields? At a minimum, you have an implicit schema. This is what makes an Aristotelian-system Aristotelian.

See my video on Platonic and Aristotelian Data Philosophies for more information on why "NoSQL" is a modern marketing fiction similar to "AJAX".

More particularly, Elasticsearch has a strong focus on the schema than MongoDB.

You might find people say that Elastic is schemaless. These people have neither read nor peeked at the docs. Elastic is very explicit about it's indexes. Sometimes you'll hear that it's schemaless because it uses Lucene (the engine deep deep deep down that does the searching), which is schemaless. That's stupid. Lucene uses straight bytes and Elastic adds the schema on top of it. Your file system uses bytes and SQL Server adds a schema on top of it. Just because your file system uses bytes, not a schema, this doesn't mean that SQL Server "doesn't have a schema" because it uses MDF files on a a file system using bytes. SQL Server has a schema. Elastic has a schema. It might be implicit, but it has a schema. Even if you never create one, there is a schema. Elastic is explicit about having a schema.

Beyond the Aristotelian nature though, like MongoDB, ES is an object database (or "document database" as the marketing guys call it, but, unless we wildly redefine what "document" means, I've never stored a document in MongoDB / ES!) You work with objects anyway, why in the world are you translating them to and from relational patterns? Square peg -> round role. ES and MongoDB are perfect for systems that rely heavily on objects. JSON in / JSON out. No translation required. This one feature here is why many ditch SQL Server for ES or MongoDB. Translation for the sake of translation is insane.

Yet another difference is in terms of access: because MongoDB uses TCP and ES uses HTTP, for all practical purposes, MongoDB requires a library, but an ES library is redundant. When something is as dynamic as ES, using strongly-typed objects in .NET makes .NET fodder for ridicule. .NET developers with an old school mindset (read: inability to paradigm shift) in particular have an unhealthy attachment to make everything they touch into some strongly-typed abstraction layer (though not .NET, the blasphemy known as Mongoose comes to mind.) This is also everything right about the dynamic keyword in C#. It's good to see C# finally get close to the modern world.

The stereotype of a .NET developer (for non-.NET developers) is this: the dude is given a perfectly good HTTP endpoint, he then proceeds to express his misguided cleverness by destroying the open nature of the endpoint by wrapping it in yet another API you have to learn. You can simply look at Nuget to see the massive number of pointless abstraction layers everyone feels the need to dump onto the world. To make matters worst, when you want to make a simple call, you're forced to defeat the entire point of the clean RESTful API by using the pointless abstraction layer. Fail. Abstraction layers are fun to write to learn an API, but... dude... keep them to yourself. Go simple. Go raw. Nobody wants abstraction layer complexity analogous to WebForms; there's a reason Web API is so popular now. Don't ruin it. This lesson is for learning, not to create yet another pointless abstraction layer. Live and love dynamic programming. Welcome to the modern world.

This is most likely why REST is little more than a colloquialism. People who attempt to use hypermedia (a requirement for REST) either find it pointlessly complicated or impossible to wrap. REST is dead until something like the HAL spec gains widespread acceptance; for now, we use proper term "REST" to mean any HTTP verb-driven, resource-based API. ES is not REST, it's "REST" only in this latter sense. We usually refer to this, ironically, as "RESTful".

Game Plan

This leads to the point of this entire document: you will learn to use ES from first-principles; you will be able to write queries as wrapped or unwrapped as you want.

My game plan here will seem absolutely backward:

  • Abstract some lower-level ES functionality with PowerShell using search as an example.
  • Discuss ES theory, index setup, and data inserting.

Given that setup is a one-time thing and day-to-day stuff is... uh... daily, the day-to-day stuff comes first. The first-things-first fallacy can die.

I chose PowerShell for this because it's almost guaranteed to be something you've not seen before-- and it offends .NET developers, iOS developers, and Python developers equally. In practice, I use raw curl to manage my ES clusters and Python for mass importing (.NET has far too much bloat for one-off tools!)

As I demonstrate using ES from PowerShell, I will give commentary on what I'm doing with ES. You should be able to learn both ES and some practical, advanced PowerShell. If you don't care about PowerShell... even better! One reason I chose PowerShell was to make sure you focus on the underlying concepts.

There's a lot of really horrible PowerShell out there. I'm not part of the VB-style / Cmdlet / horribly-tedious-and-tiring-long-command-name PowerShell weirdness. If you insist on writing out Invoke-WebRequest instead of simply using wget, but use int and long instead of Int32 and Int64, then you have a ridiculous inconsistency to work out. Also, you're part of the reason PowerShell isn't more widespread.. You are making it difficult on everyone . In the following code, we're going to use PowerShell that won't make you hate PowerShell; it will be pleasant and the commands will ends up rolling off your fingers (something that won't happen with the horrible longhand command names). Linux users will feel at home with this syntax. VB-users will be wildly offended. EXCELLENT!

Flaw Workaround

Before we do anything, we have to talk about one of the greatest epic fails in recent software history...

While the Elasticsearch architecture is truly epic, it has a known design flaw (the absolute worst form of a bug): it allows a POST body in a GET request. This makes development painful:

  • Fiddler throws a huge red box at you.
  • wget in PowerShell gives you an error.
  • Postman in Chrome doesn't even try.
  • HttpClient in .NET throws System.Net.ProtocolViolationException saying "Cannot send a content-body with this verb-type."

.NET is right. Elasticsearch is wrong. Breaking the rules for the sake of what you personally feel makes you a vigilante. Forcing a square peg into a round hole just for the sake of making sure "gets" are done with GET makes you an extremist. Bodies belong in POST and PUT, not GET.

It's a pretty stupid problem having given how clever their overall architecture is. There's a an idiom for this in English: Homer Nodded.

To get around this flaw, instead of actually allowing us to search with POST (like normal people would), we are forced to use a hack: source query string parameter.

PowerShell Setup

When following along, you'll want to use the PowerShell ISE. Just type ISE in PowerShell.

PowerShell note: hit F5 in ISE to run a script

If you are going to run these in in a ps1 file, make sure to run Set-ExecutionPolicy RemoteSigned. as admin. Microsoft doesn't seem to like PowerShell at all. It's not the default in Windows 8/10. It's not the default on Windows Server. You can't run scripts by default. Someone needs to be fired. Run the aforementioned command to allow local scripts.

HTTP/S call

We're now ready to be awesome.

To start, let's create a call to ES. In the following code, I'm calling HTTPS with authorization. I'm not giving you the sissy Hello World, this is from production. While you're playing around, you can remove HTTP and authorization. You figure out how. That's part of learning.

$base = 'search.domain.net' 

$call = {
    param($params)

    $uri = "https://$base"

    $headers = @{ 
        'Authorization' = 'Basic fVmBDcxgYWpndYXJj3RpY3NlkZzY3awcmxhcN2Rj'
    }

    $response = $null
    $response = wget -Uri "$uri/$params" -method Get -Headers $headers -ContentType 'application/json'
    $response.Content
}

PowerShell note: prefix your : with ` or else you'll get a headache

So far, simple.

We can call &$call to call an HTTPS service with authorization.

But, let's break this out a bit...

$call = {
    param($verb, $params)

    $uri = "https://$base"

    $headers = @{ 
        'Authorization' = 'Basic fVmBDcxgYWpndYXJj3RpY3NlkZzY3awcmxhcN2Rj'
    }

    $response = wget -Uri "$uri/$params" -method $verb -Headers $headers -ContentType 'application/json'
    $response.Content
}

$get = {
    param($params)
    &$call "Get" $params
}

$delete = {
    param($params)
    &$call "Delete" $params
}

Better. Now we can call various verb functions.

To add PUT and POST, we need to account for the POST body. I'm also going to add some debug output to make life easier.

$call = {
    param($verb, $params, $body)

    $uri = "https://$base"

    $headers = @{ 
        'Authorization' = 'Basic fVmBDcxgYWpndYXJj3RpY3NlkZzY3awcmxhcN2Rj'
    }

    Write-Host "`nCalling [$uri/$params]" -f Green
    if($body) {
        if($body) {
            Write-Host "BODY`n--------------------------------------------`n$body`n--------------------------------------------`n" -f Green
        }
    }

    $response = wget -Uri "$uri/$params" -method $verb -Headers $headers -ContentType 'application/json' -Body $body
    $response.Content
}

$put = {
    param($params,  $body)
    &$call "Put" $params $body
}

$post = {
    param($params,  $body)
    &$call "Post" $params $body
}

In addition to having POST and PUT, we can also see what serialized data we are sending, and where.

ES Catalog Output

Now, let's use $call (or $get, etc) in something with some meaning:

$cat = {
    param($json)

    &$get "_cat/indices?v&pretty"
}

This will get the catalog of indexes.

Elasticsearch note: You can throw pretty anywhere to get formatted JSON.

Running &$cat gives me the following json:

[ {
  "health" : "yellow",
  "status" : "open",
  "index" : "site1!production",
  "pri" : "5",
  "rep" : "1",
  "docs.count" : "170",
  "docs.deleted" : "0",
  "store.size" : "2.4mb",
  "pri.store.size" : "2.4mb"
}, {
  "health" : "yellow",
  "status" : "open",
  "index" : "site2!production",
  "pri" : "5",
  "rep" : "1",
  "docs.count" : "141",
  "docs.deleted" : "0",
  "store.size" : "524.9kb",
  "pri.store.size" : "524.9kb"
} ]

But, we're in PowerShell; we can do better:

ConvertFrom-Json (&$cat) | ft

Output:

 health status index                 pri rep docs.count docs.deleted store.size pri.store.size
------ ------ -----                 --- --- ---------- ------------ ---------- --------------
yellow open   site1!staging         5   1   176        0            2.5mb      2.5mb         
yellow open   site2!staging         5   1   144        0            514.5kb    514.5kb    
yellow open   site1!production      5   1   170        0            2.4mb      2.4mb         
yellow open   site2!production      5   1   141        0            524.9kb    524.9kb     

Example note: !production and !staging have nothing to do with ES. It's something I do in ES, Redis, Mongo, SQL Server, and every other place the data will be stored to separate deployments. Normally I would remove this detail from article samples, but the following examples use this to demonstrate filtering.

PowerShell note: Use F8 to run a selection or single line. It might be worth removing your entire $virtualenv, if you want to play around with this.

Much nicer. Not only that, but we have the actual object we can use to filter on the client side. It's not just text.

(ConvertFrom-Json (&$cat)) `
    | where { $_.index -match '!production' }  `
    | select index, docs.count, store.size |  ft

Output:

index              docs.count store.size
-----              ---------- ----------
site1!production   170        2.4mb     
site2!production   141        532.9kb   

Getting Objects from ES

Let's move forward by adding our search function:

$search = {
    param($index, $json)

    &$get "$index/mydatatype/_search?pretty&source=$json"
}

Calling it...

&$search 'site2!production' '{
    "query": {
        "match_phrase": {
            "content": "struggling serves"
        }
    }
}'

Elasticsearch note match_phrase will match the entire literal phrase "struggling serves"; match would have search for "struggling" or "serves". Results will return with a score, sorted by that score; entries with both words would have a higher score than an entry with only one of them. Also, wildcard will allow stuff like `struggl*.

Meh, I'm not a big fan of this analogy of SELECT * FROM [site2!production]:

&$search 'site2!production' '{
    "query": {
        "match_phrase": {
            "content": "struggling serves"
        }
    },
    "fields": ["selector", "title"]
}'

This will return a bunch of JSON.

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.6106973,
    "hits" : [ {
      "_index" : "site2!staging",
      "_type" : "entry",
      "_id" : "AVGUw_v5EFj4l3MNvkeV",
      "_score" : 0.6106973,
      "fields" : {
        "selector" : [ "sweet/mine" ]
      }
    }, {
      "_index" : "site2!staging",
      "_type" : "entry",
      "_id" : "AVGU3PwoEFj4l3MNvk9D",
      "_score" : 0.4333064,
      "fields" : {
        "selector" : [ "of/ambition" ]
      }
    }, {
      "_index" : "site2!staging",
      "_type" : "entry",
      "_id" : "AVGU3QHeEFj4l3MNvk9G",
      "_score" : 0.4333064,
      "fields" : {
        "selector" : [ "or/if" ]
      }
    } ]
  }
}

We can improve on this.

First, we can convert the input to something nicer:

&$search 'site2!production' (ConvertTo-Json @{
    query = @{
        match_phrase = @{
            content = "struggling serves"
        }
    }
    fields = @('selector', 'title')
})

Here we're just creating a dynamic object and serializing it. No JIL or Newtonsoft converters required.

To make this a a lot cleaner, here's a modified $search:

$search = {
    param($index, $json, $obj)
    if($obj) {
        $json = ConvertTo-Json -Depth 10 $obj
    }

   &$get "$index/mydatatype/_search?pretty&source=$json"
}

You need -Depth <Int32> because the default is 2. Nothing deeper than the default will serialize. It will simply show "System.Collections.Hashtable. In ES, you'll definitely have deep objects.

Now, I can call this with this:

&$search 'site2!production' -obj @{
    query = @{
        match_phrase = @{
            content = "struggling serves"
        }
    }
    fields = @('selector', 'title')
}

This works fine. Not only that, but the following code still work:

&$search 'site2!production' '{
    "query": {
        "match_phrase": {
            "content": "struggling serves"
        }
    }
    "fields" = ["selector", "title"]
}'

Now you don't have to fight with escaping strings; you can also still copy/paste JSON with no problem.

JSON to PowerShell Conversion Notes:

  • : becomes =
  • all ending commas go away
    • newlines denote new properties
  • @ before all new objects (e.g. {})
  • [] becomes @()
    • @() is PowerShell for array
  • " becomes ""
    • PowerShell escaping is double-doublequotes

DO NOT FORGET THE @ BEFORE {. If you do, it will sits there forever as it tries to serialize nothing into nothing. After a few minutes, you'll get hundreds of thousands of JSON entries. Seriously. I tries to serialize every aspect of every .NET property forever. This is why the -Depth defaults to 2.

Next, let's format the output:

(ConvertFrom-Json(&$search 'content!staging' 'entry' -obj @{
    query = @{
        match_phrase = @{
            content = "struggling serves"
        }
    }
    fields = @('selector', 'title')
})).hits.hits.fields | ft

Could probably just wrap this up:

$matchPhrase = {
    param($index, $type, $text, $fieldArray)
    (ConvertFrom-Json(&$search $index $type -obj @{
        query = @{
            match_phrase = @{
                content = $text
            }
        }
        fields = $fieldArray
    })).hits.hits.fields | ft
}

Just for completeness, here's $match. Nothing too shocking.

$match = {
    param($index, $type, $text, $fieldArray)
    (ConvertFrom-Json(&$search $index $type -obj @{
        query = @{
            match = @{
                content = $text
            }
        }
        fields = $fieldArray
    })).hits.hits.fields | ft
}

Finally, we have this:

&$match 'content!staging' 'entry' 'even' @('selector', 'title')

Output:

title                               selector           
-----                               --------           
{There Defeats Cursed Sinews}       {forestalled/knees}
{Foul Up Of World}                  {sweet/mine}       
{And As Rank Down}                  {crown/faults}     
{Inclination Confront Angels Stand} {or/if}            
{Turn For Effects World}            {of/ambition}      
{Repent Justice Is Defeats}         {white/bound}      
{Buys Itself Black I}               {form/prayer}

There we go: phenominal cosmic power in an ity bity living space

Beefier Examples and Objects

Here's an example of a search that's a bit beefier:

&$search  'bible!production' -obj @{
    query = @{
        filtered = @{
            query = @{
                match = @{
                    content = "river Egypt"
                }
            }
            filter = @{
                term = @{
                    "labels.key" = "isaiah"
                }
            }
        }
    }
    fields = @("passage")
    highlight = @{
        pre_tags = @("<span class=""search-result"">")
        post_tags = @("</span>")
        fields = @{
            content = @{
                fragment_size = 150
                number_of_fragments = 3
            }
        }
    }
}

Elasticsearch note: Filters are binary: have it or not? Queries are analog: they have a score. In this example, I'm moving a filter with a query. Here I'm searching the index for content containing "river" or "Egypt" where labels: { "key": 'isaiah' }

Using this I'm able to to filter my documents by label where my labels are complex objects like this:

  "labels": [
    {
      "key": "isaiah",
      "implicit": false,
      "metadata": []
    }
  ]

I'm able to search by labels.key to do a hierarchical filter. This isn't an ES tutorial; rather, this is to explain why "labels.key" was in quotes in my PowerShell, but nothing else is.

Design note: The objects you sent to ES should be something optimized for ES. This nested type example is somewhat contrived to demonstrate nesting. You can definitely just throw your data and ES and it will figure out the schema on the fly, but that just means you're lazy. You're probably the type of person who used the forbidden [KnownType] attribute in WCF because you were too lazy to write DTOs. Horrible. Go away.

This beefier example also shows me using ES highlighting. In short, it allows me to tell ES that I want a content summary of a certain size with some keywords wrapped in some specified HTML tags.

This content will show in addition to the requested fields.

The main reason I mention highlighting here is this:

When you serialize the object, it will look weird:

"pre_tags":  [
   "\u003cspan class=\"search-result\"\u003e"
]

Chill. It's fine. I freaked out at first too. Turns out ES can handle unicode just fine.

Let's run with this highlighting idea a bit by simplifying it, parameterizing it, and deserializing the result (like we've done already):

$result = ConvertFrom-Json(&$search  $index -obj @{
    query = @{
        filtered = @{
            query = @{
                match = @{
                    content = $word
                }
            }
        }
    }
    fields = @("selector", "title")
    highlight = @{
        pre_tags = @("<span class=""search-result"">")
        post_tags = @("</span>")
        fields = @{
            content = @{
                fragment_size = 150
                number_of_fragments = 3
            }
        }
    }
})

Nothing new so far. Same thing, just assigning it to a variable...

The JSON is passed back was this...

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 30,
    "max_score" : 0.093797974,
    "hits" : [ {
      "_index" : "content!staging",
      "_type" : "entry",
      "_id" : "AVGhy1octuYZuX6XP7zu",
      "_score" : 0.093797974,
      "fields" : {
        "title" : [ "Pause First Bosom The Oft" ],
        "selector" : [ "hath come/hand of" ]
      },
      "highlight" : {
        "content" : [ " as to fault <span class=\"search-result\">smells</span> ourselves am free wash not tho
se lies enough business eldest sharp first what corrupted blood which state knees wash our cursed oft", " give
 above shall curse be help faults offences this snow me pray shuffling confront ere forgive newborn a engaged 
<span class=\"search-result\">smells</span> rain state angels up form", " ambition eldest i guilt <span class=
\"search-result\">smells</span> forehead true snow thicker rain compelld intent bound my which currents our so
ul of limed angels white snow this buys" ]
      }
    },
    ...    

The data we want is is hits.hits.fields and hits.hits.highlights.

So, we can get them, and play with the output...

$hits = $result.hits.hits
$formatted = $hits | select `
        @{ Name='selector'; Expression={$_.fields.selector} },
        @{ Name='title'; Expression={$_.fields.title} }, 
        @{ Name='highlight'; Expression={$_.highlight.content} }
$formatted

This is basically the following...

hits.Select(p=>new { selector = p.fields.selector, ...});

Output:

selector                   title                            highlight                                        
--------                   -----                            ---------                                        
hath come/hand of          Pause First Bosom The Oft        { as to fault <span class="search-result">smel...
all fall/well thicker      Enough Nature Heaven Help Smells { buys me sweet queen <span class="search-resu...
with tis/law oft           Me And Even Those Business       { if what form this engaged wretched heavens m...
hand forehead/engaged when Form Confront Prize Oft Defeats  {white begin two-fold faults than that strong ...
newborn will/not or        Were Gilded Help Did Nature      { above we evidence still to me no where law o...
cursed thicker/as free     Cursed Tis Corrupted Guilt Where { justice wicked neglect by <span class="searc...
currents hand/true of      Both Than Serves Were May        { serves engaged down ambition to man it is we...
two-fold can/eldest queen  Then Sweet Intent Help Turn      { heart double than stubborn enough the begin ...
force did/neglect whereto  Compelld There Strings Like Not  { it oft sharp those action enough art rests s...
babe whereto/whereto is    As Currents Prayer That Free     { defeats form stand above up <span class="sea...

This part is important: highlight is an array. Your search terms may show up more than once in a document. That's where the whole number_of_fragments = 3 came in. The highlight size is from that fragment_size = 150. So, for each entry we have, we have a selector (basically an ID), a title, and up to three highlights up to 150-characters each.

Let's abstract this stuff before we go to the final data analysis step:

$loadHighlights = {
    param($index, $word)

    $result = ConvertFrom-Json(&$search  $index -obj @{
        query = @{
            filtered = @{
                query = @{
                    match = @{
                        content = $word
                    }
                }
            }
        }
        fields = @("selector", "title")
        highlight = @{
            pre_tags = @("<span class=""search-result"">")
            post_tags = @("</span>")
            fields = @{
                content = @{
                    fragment_size = 150
                    number_of_fragments = 3
                }
            }
        }
    })

    $hits = $result.hits.hits
    $hits | select `
            @{ Name='selector'; Expression={$_.fields.selector} },
            @{ Name='title'; Expression={$_.fields.title} }, 
            @{ Name='highlight'; Expression={$_.highlight.content} }
}

Now, we can run:

&$loadHighlights 'content!staging' 'smells'

Let's use this and analyze the results:

(&$loadHighlights 'content!staging' 'smells').foreach({
    Write-Host ("Selector: {0}" -f $_.selector)
    Write-Host ("Title: {0}`n" -f $_.title)
    $_.highlight | foreach {$i=0} {
        $i++
        Write-Host "$i $_"
    }
    Write-Host "`n`n"
})

Here's a small sampling of the results:

Selector: two-fold can/eldest queen
Title: Then Sweet Intent Help Turn

1  heart double than stubborn enough the begin and confront as <span class="search-result">smells</span> ourse
lves death stronger crown murder steel pray i though stubborn struggling come by
2  of forehead newborn mine above forgive limed offences bosom yet death come angels angels <span class="searc
h-result">smells</span> i sinews past that murder his bosom being look death
3  <span class="search-result">smells</span> where strong ill action mine foul heavens turn so compelld our to
 struggling pause force stubborn look forgive death then death try corrupted



Selector: force did/neglect whereto
Title: Compelld There Strings Like Not

1  it oft sharp those action enough art rests shove stand cannot rain bosom bosom give tis repentance try upon
t possessd my state itself lies <span class="search-result">smells</span> the
2  brothers blood white shove no stubborn than in ere <span class="search-result">smells</span> newborn art re
pentance as like though newborn will form upont pause oft struggling forehead help
3  shuffling serve lies <span class="search-result">smells</span> stand well queen well visage and free his pr
ayer lies that art ere a there law even by business confront offences may retain

We have the selector, the title, and the highlights with <span class="search-result">...</span> showing us where our terms were found.

Setup

Up to this point, I've assumed that you already have an ES setup. Setup is once, but playing around is continuous. So, I got the playing around out of the way first.

Now, I'll go back and talk about ES setup with PowerShell. This should GREATLY improve your ES development. Well, it's what helps me at least...

ES has a schema. It's not fully Platonic like SQL Server, nor is it fully Aristotelian like MongoDB. You can throw all kinds of things at ES and it will figure them out. This is what ES calls dynamic mapping. If you like the idea of digging through incredibly massive data dumps during debugging or passing impossibly huge datasets back and forth, then this might be the way to go (were you "that guy" who threw [KnownType] on your WCF objects? This is you. You have no self-respect.) On the other hand, if you are into light-weight objects, you're probably passing ES a nice tight JSON object anyway. In any case, want schema to be computed as you go? That's dynamic mapping. Want to define your schema and have ES ignore unknown properties? That's, well, disabling dynamic mapping.

Dynamic mapping ends up being similar to the lowest-common denominator ("LCD") schema like in Azure Table Storage: your schema might end up looking like a combination of all fields in all documents.

ES doesn't so much deal with "schema" in the abstract, but with concrete indexes and types.

No, that doesn't mean it's schemaless. That's insane. It means that index and the types are the schema.

In any case, in ES, you create indexes. These are like your tables. Your indexes will have metadata and various properties much like SQL Server metadata and columns. Properties have types just like SQL Server columns. Unlike SQL Server, there's also a concept of a type. Indexes can have multiple types.

Per the Elasticsearch: Definitive Guide, the type is little more than a "_type" property internally, thus types are almost like property keys in Azure Table Storage. This means that when searching, you're searching across all types, unless you specify the type as well. Again, this maps pretty closely to a property key in Azure Table Storage.

Creating an index

Creating an index with a type is a matter of calling POST / with your mapping JSON object. Our $createIndex function will be really simple:

$createIndex = {
    param($index, $json, $obj)
    if($obj) {
        $json = ConvertTo-Json -Depth 10 $obj
    }
    &$post $index $json
}

Thing don't get interesting until we call it:

&$createIndex 'content!staging' -obj @{
    mappings = @{
        entry = @{
            properties = @{
                selector = @{
                    type = "string"
                }
                title = @{
                    type = "string"
                }
                content = @{
                    type = "string"
                }
                created = @{
                    type = "date"
                    format = "YYYY-MM-DD"  
                }
                modified = @{
                    type = "date"                  
                }
            }
        }
    }
}

This creates an index called content!staging with a type called entry with five properties: selector, title, content, created, and modified.

The created property is there to demonstrate that fact that you can throw formats on properties. Normally, dates are UTC, but here I'm specifying that I don't even care about times when it comes to the create date.

With this created, we can see how ES sees our data. We do this by calling GET //_mapping:

$mapping = {
    param($index)
   &$get "$index/_mapping?pretty"
}

Now to call it...

&$mapping 'content!staging'

Adding Data

Now to throw some data at this index...

Once again, the PowerShell function is simple:

$add = {
    param($index, $type, $json, $obj)
    if($obj) {
        $json = ConvertTo-Json -Depth 10 $obj
    }
    &$post "$index/$type" $json
}

To add data, I'm going to use a trick I wrote about elsewhere

if (!([System.Management.Automation.PSTypeName]'_netfxharmonics.hamlet.Generator').Type) {
    Add-Type -Language CSharp -TypeDefinition '
        namespace _Netfxharmonics.Hamlet {
            public static class Generator {
                private static readonly string[] Words = "o my offence is rank it smells to heaven hath the primal eldest curse upont a brothers murder pray can i not though inclination be as sharp will stronger guilt defeats strong intent and like man double business bound stand in pause where shall first begin both neglect what if this cursed hand were thicker than itself with blood there rain enough sweet heavens wash white snow whereto serves mercy but confront visage of whats prayer two-fold force forestalled ere we come fall or pardond being down then ill look up fault past form serve turn forgive me foul that cannot since am still possessd those effects for which did crown mine own ambition queen may one retain corrupted currents world offences gilded shove by justice oft tis seen wicked prize buys out law so above no shuffling action lies his true nature ourselves compelld even teeth forehead our faults give evidence rests try repentance can yet when repent wretched state bosom black death limed soul struggling free art more engaged help angels make assay bow stubborn knees heart strings steel soft sinews newborn babe all well".Split('' '');
                private static readonly int Length = Words.Length;
                private static readonly System.Random Rand = new System.Random();

                public static string Run(int count, bool subsequent = false) {
                    return Words[Rand.Next(1, Length)] + (count == 1 ? "" : " " + Run(count - 1, true));
                }
            }
        }

    '
}

n Let's use $gen with $add to load up some data:

$ti = (Get-Culture).TextInfo
(1..30).foreach({
    &$add -index 'content!staging' -type 'entry' -obj @{
        selector = "{0}/{1}" -f ([_netfxharmonics.hamlet.Generator]::Run(1)), ([_netfxharmonics.hamlet.Generator]::Run(1))
        title = $ti.ToTitleCase([_netfxharmonics.hamlet.Generator]::Run(4))
        content = [_netfxharmonics.hamlet.Generator]::Run(400) + '.'
        created = [DateTime]::Now.ToString("yyyy-MM-dd")
        modified = [DateTime]::Now.ToUniversalTime().ToString("o")
    }
})

This runs fast; crank the sucker up to 3000 with a larger content size if you want. Remove "Write-Host" from $call for more speed.

Your output will look something like this

Calling [https://search.domain.net/content!staging/entry]
BODY
--------------------------------------------
{
    "selector":  "death/defeats",
    "title":  "Sinews Look Lies Rank",
    "created":  "2015-11-07",
    "content":  "Engaged shove evidence soul even stronger bosom bound form soul wicked oft compelld steel which turn prize yet stand prize.",
    "modified":  "2015-11-07T05:45:10.6943622Z"
}
--------------------------------------------

When a run one of the earlier searches...

&$match 'content!staging' 'entry' 'even'

...we get the following results:

selector        
--------        
{death/defeats}

Debugging

If stuff doesn't work, you need to figure out how to find out why; not simply find out why. So, brain-mode activated: wrong results? Are you getting them wrong or are they actually wrong? Can't insert? Did the data make it to the server at all? Did it make it there, but couldn't get inserted? Did it make it there, get inserted, but you were simply told that it didn't insert? Figure it out.

As far as simple helps, I'd recommend doing some type of dump:

$dump = {
    param($index, $type)
    &$get "$index/$type/_search?q=*:*&pretty"
}

This is a raw JSON dump. You might want to copy/paste somewhere for analysis, or play around in PowerShell:

(ConvertFrom-Json(&$dump 'content!staging' 'entry')).hits.hits

I'd recommend just using a text editor to look around instead of random faux PowerShell data-mining. Because, JSON and XML both absolutely perfectly human readable, you'll see what you need quick. Even then, there's no reason no to just type the actual link into your own browser:

http://10.1.60.3:9200/content!staging/entry/_search?q=*:*&pretty

I'd recommend the Pretty Beautiful Javascript extension for Chrome.

You can remove &pretty when using this Chrome extension.

Another thing I'd strongly recommend is having a JSON beautifier toggle for input JSON:

$pretty = $False

So you can do something like this:

$serialize = {
    param($obj)
    if(!$pretty) {
        $pretty = $false
    }
    if($pretty) {
        ConvertTo-Json -Depth 10 $obj;
    }
    else {
        ConvertTo-Json -Compress -Depth 10 $obj
    }
}

Instead of calling ConvertTo-Json in your other places, just call &$serialize.

$search = {
    param($index, $type, $json, $obj)
    if($obj) {
        $json = &$serialize $obj
    }
    &$get "$index/$type/_search?pretty&source=$json"
}

Remember, this is for input, not output. This is for the data going to the server.

You want this option because once you disable this, you can do this:

&$match 'content!staging' 'entry' 'struggling' @('selector', 'title')

To get this...

Calling [http://10.1.60.3:9200/content!staging/entry/_search?pretty&source={"fields":["selector","title"],"query":{"match":{"content":"struggling"}}}]

title                            selector         
-----                            --------         
{Those Knees Look State}         {own/heavens}    

Now you have a URL you can dump into your web browser. Also, you have a link to share with others.

Regarding logs, on Linux systems you can view error messages at the following location:

/var/log/elasticsearch/CLUSTERNAME.log

I like to keep a SSH connection open while watching the log closely:

tail -f /var/log/elasticsearch/david-content.log 

Your cluster name will be configured in the YAML config somewhere around /etc/elasticsearch/elasticsearch.yml.

Updating (and Redis integration)

Updating Elasticsearch objects ("documents") is interesting for two reasons, a good one and a weird one:

Good reason: documents are immutable. Updates involve marking the existing item as deleted and inserting a new document. This is exactly how SQL Server 2014 IMOLTP works. It's one secret of extreme efficiency. It's an excellent practice to follow.

Weird reason: you have to update to know the integer ID to update a document. It's highly efficient, which makes it, at worst, "weird"; not "bad". It once allowed updates based on custom fields, you'd have a potential perf hit. Key lookups are the fastest.

Prior to Elasticsearch 2.x, you could add something like { "_id": { "path": "selector" } } to tell ES that you want to use your "selector" field as your ID. This was deprecared in version 1.5 and removed in 2.x (yes, they are two separate things). Today, _id is immutable. So, when you see docs saying you can do this, check the version. It will probably be something like version 1.4. Compare the docs for _id in version 1.4 with version 2.1 to see what I mean.

When you make a call like the following example, an cryptic ID is generated:

POST http://10.1.60.3:9200/content!staging/entry

But, you can specify the integer:

POST http://10.1.60.3:9200/content!staging/entry/5

This is great, but nobody anywhere cares about integer IDs. These surrogate keys have absolutely no meaning to your document. How in the world could you possibly know how to update something? You have to know the ID.

If you have your own useful identifier, then good for you, just use the following:

POST http://10.1.60.3:9200/content!staging/entry/tacoburger

Yet, you can't use any type of slash. Soooo.... forget it. Since I usually use ES to store content linked from URLs, this is isn't going to fly. Besides, nobody wants to have to keep track of all various encodings you have to do to make your data clean. So, we need to add some normalization to our IDs both to make ES happy and to keep our usage simple.

So, OK, fine... have to save some type of surrogate key to key map somewhere. Where could I possibly save them? Elasticsearh IS. MY. DATABASE. I need something insanely efficient for key / value lookups, but that persists to disk. I need something easy to use on all platforms. It should also be non-experimental. It should be a time-tested system. Oh... right: Redis.

The marketing says that Redis is a "cache". Whatever that means. It's the job of marketing to either lie about products to trick people into buying stuff or to downplay stuff for the sake of a niche market. In reality, Redis is a key/value database. It's highly efficiently and works everywhere. It's perfect. Let's start making the awesome...

I'l all about doing things based on first-principles (people who can't do this laugh at people who can do this and accuse them of "not invented here syndrome"; jealous expresses it in many ways), but I'm here I'm going to use the Stackoverflow.Redis package. It seems to be pretty standard and it works pretty well. I'm running it in a few places. Create some VS2015 (or whatever) project and add the NuGet package. Or, go find it and download it. But... meh... that sounds like work. Use NuGet.. Now we're going to reference that DLL..

$setupTracking = {
    Add-Type -Path 'E:\_GIT\awesomeness\packages\StackExchange.Redis.1.0.488\lib\net45\StackExchange.Redis.dll'
    $cs = '10.1.60.2'
    $config = [StackExchange.Redis.ConfigurationOptions]::Parse($cs)
    $connection = [StackExchange.Redis.ConnectionMultiplexer]::Connect($config)
    $connection.GetDatabase()
}

Here I'm adding the assembly, creating my connection string, creating a connection, then getting the database.

Let's call this and set some type of relative global:

$redis = &$setupTracking

We need to go over a few things in Redis first:

Redis communicates over TCP. You sends commands to it and you get stuff back. The commands are assembler-looking codes like:

  • HGET
  • FLUSHALL
  • KEYS
  • GET
  • SET
  • INCR

When you use INCR, you are incrementing a counter. So...

INCR taco

That sets taco to 1.

INCR taco

Now it's 2.

We can get the value...

GET taco

The return value will be 2.

By the way, this is how you setup realtime counters on your website. You don't have to choose between database locking and eventual consistency. Use Redis.

Then there's the idea of a hash. You know, a dictionary-looking thingy.

So,

HSET chicken elephant "dude"

This sets elephant on the chicken hash to "dude".

HGET chicken elephant

This gets "dude". Shocking... I know.

HGETALL chicken

This dumps the entire chicken hash.

Weird names demonstrate that the name has nothing to do with the system and it forces you to think, thus remembering it better long-term.

To get all the values, do something like this:

KEYS *

When I say "all", I mean "all". Both the values that INCR and the values from HSET will show. This is a typical wildcard. You can do stuff like KEYS *name* just fine.

Naming note: Do whatever you want, but it's commmon to use names like "MASTERSCOPE:SCOPE#VARIABLE". My system already has a well defined internal naming system of Area!Environment, so in what follows we'll use "content!staging#counter" and "content!staging#Hlookup"

OK, that's enough to get started. Here's the plan: Because the integer IDs mean absolutely nothing to me, I'm going to treat them as an implemenation detail; more technically, as a surrogate key. My key is selector. I want to update via selector not some internal ID that means nothing to me.

To do this, I'll basically just emulate what Elasticsearch 1.4 did: specify what property I want as my key.

To this end, I need to add a new $lookupId function, plus update both $add and $delete:

$lookupId = {
    param($index, $selector)

    if($redis) {
        $id = [int]$redis.HashGet("$index#Hlookup", $selector)
    }
    if(!$id) {
        $id = 0
    }
    $id
}

$add = {
    param($index, $type, $json, $obj, $key)
    if($obj) {
        $json = &$serialize $obj
        if($key) {
            $keyValue = $obj[$key]
        }
    }
    
    if($redis -and $keyValue) {
        $id = &$lookupId $index $keyValue
        Write-Host "`$id is $id"
        if($id -eq 0) {
            $id = [int]$redis.StringIncrement("$index#counter")
            if($verbose) {
                Write-Host "Linking $keyValue to $id"
            }
            &$post "$index/$type/$id" $json
            [void]$redis.HashSet("$index#Hlookup", $keyValue, $id)
        }
        else {
            &$put "$index/$type/$id" $json
        }

    }
    else {
        &$post "$index/$type" $json
    }
}

$delete = {
    param($index)
    &$call "Delete" $index

    if($redis) {
        [void]$redis.KeyDelete("$index#counter")
        [void]$redis.KeyDelete("$index#Hlookup")
    }
}

When stuff doens't exist, you get some type of blank entity. I've never seen a null while using the Stackoverflow.Redis package, so that's something to celebrate. The values that Stackoverflow.Redis methods work with are RedisKey and RedisValue. There's not much to learn there though, since there are operators for many different conversions. You can work with strings just fine without needing to know about RedisKey and RedisValue.

So, if I'm sending it a key, key the key value from the object I sent in. If there is a key value and Redis is enabled and active, see if that key value is the ID of an existing item. That's a Redis lookup. Not there? OK, must be new, use Redis to generate a new ID and send that to Elasticsearch (POST $index/$type/$id). The ID was already there? That means the selector was already assigned a unique, sequential ID by Redis, use that for the update.

For now, POST works fine for an Elasticsearch update as well. Regardless, I'd recommend using PUT for update even though POST works. You never know when they'll enforce it.

Let's run a quick test:

$selectorArray = &$generate 'content!staging' 'entry' 2 -key 'selector'

($selectorArray).foreach({
    $selector = $_
    Write-Host ("ID for $selector is {0}" -f (&$lookupId 'content!staging' $selector))
})

Output:

ID for visage/is is 4
ID for if/blood is 5

I'm going to hope over to Chrome to see how my data looks:

http://10.1.60.3:9200/content!staging/_search?q=*:*

It's there...

{
    "_index": "content!staging",
    "_type": "entry",
    "_id": "4",
    "_score": 1,
    "_source": {
        "selector": "visage/is",
        "title": "Bound Fault Pray Or",
        "created": "2015-11-07",
        "content": "very long content omitted",
        "modified": "2015-11-07T22:24:23.0283870Z"
    }
}

Cool, ID is 4.

What about updating?

Let's try it...

$obj = @{
    selector = "visage/is"
    title = 'new title, same document'
    content = 'smaller content'
    created = [DateTime]::Now.ToString("yyyy-MM-dd")
    modified = [DateTime]::Now.ToUniversalTime().ToString("o")
}
&$add -index 'content!staging' -type 'entry' -obj $obj -key 'selector' > $null

Output:

{
    "_index": "content!staging",
    "_type": "entry",
    "_id": "4",
    "_score": 1,
    "_source": {
        "selector": "visage/is",
        "title": "new title, same document",
        "created": "2015-11-07",
        "content": "smaller content",
        "modified": "2015-11-07T23:11:58.4607963Z"
    }
}

Sweet. Now I can update via my own key (selector) and not have to ever touch Elasticsearch surrogate keys (_id).

Ensuring SSL via HSTS

My wife and I use a certain vacation website to take epic vacations where we pay mostly for the hotel and get event tickets (e.g. Disney World tickets) free. I love the site-- but sometimes I'm paranoid using it. Why? Here's why:

http

No pretty lock.

You say: but were you on the checkout screen or account areas?

No.

You say: Then who cares? You're not doing anything private.

Imagine this: You're on epic-travel-site.com and you're ready to create an account. So, you click on create account. You now see the pretty SSL lock. You also see where you to in your personal information and password. You put in the info and hit submit. All is well.

Well, not so much: what you didn't notice was that page with the "Create Account" button was intercepted and modified so the "Create Account" link actually took you to epic-travel-site.loginmanagementservices.com instead of account.epic-travel-site.com.

Even then, you'll never know if that's wrong: perhaps they use loginmanagementservices.com for account management. Many sites I go to use Microsoft, Google, or Facebook for account management. It could be legit or you could have just sent your password to was owned by an information terrorist punk with a closet full of Guy Fawkes masks.

301

SSL isn't just about end-to-end encryption, it's also about end-to-end protection. You need SSL at the start.

This is easy to enforce on every webserver. On IIS it's a simple rewrite (chill, getting to Linux in a bit):

<system.webServer>
  <rewrite>
    <rules>
      <rule name="Redirect to https">
        <match url="(.*)" />
        <conditions>
          <add input="{HTTPS}" pattern="Off" />
        </conditions>
        <action type="Redirect" url="https://{HTTP_HOST}/{R:1}" />
      </rule>
    </rules>
  </rewrite>
</system.webServer>

Note: rewriting and routing are not the same. Rewriting is a very low-level procedure. Routing is application-level.

Using this, when you access HTTP, you'll get sent over to HTTPS. This is a 301 redirect.

http

That's great, but that still gives the Guy Fawkes-fanboys an opportunity to give you a horrible redirect; it's pretty easy to swap out.

307

So, let's upgrade our security: HTTP Strict Transport Security ("HSTS").

This entire thing is mostly TL;DR. For more detailed information about this and other more hardcore security techniques, watch Troy Hunt's Introduction to Browser Security Headers course at Pluralsight.

The essence of this is simple: the browser will send a Strict-Transport-Security header with a max-age in seconds. This will tell your browser to ignore your requests for HTTP and get HTTPS instead.

What browsers can use HSTS today? Go look:

http

See also: http://caniuse.com/#search=hsts

If your browser doesn't support HSTS, you need to keep up with updates. Get with it. But, whatever: if you're ancient browser doesn't support HSTS, nothing will break; you just won't receive any benefit.

What's this Strict-Transport-Security header do? Try to access the HTTP version and see:

http

It's not going to bother with the 301 roundtrip; instead, the browser will just do an "Internal Redirect". 307 means "Internal Redirect".

To put it simply: a 301 is HTTP then HTTPS and a 307 is only ever HTTPS. The existence of HTTP is not even acknowledged.

For clients, this is a major security boost. For servers, this is both a security and performance boost: you are no longer handling all those HTTP redirects! Some politician/salesman would say: "It's like getting free bandwidth!"

Testing HSTS (Chrome)

Lesson: for each web site, start with a 2 day max age and slowly grow.

While removing HSTS for others can be a hassle, removing HSTS during testing and development for yourself is simple:

In Chrome, you go to chrome://net-internals/#hsts.

You'll see delete and query. Throw your domain into query to see it. Throw it into delete to delete it.

Testing HSTS (Firefox)

For the 2-3 people left who think Firefox stills matters,

First, know that you aren't going to see a 307. It's an internal redirect anyway. It's hardly a real header. In Firefox, you see,

http

Second, to test this stuff, you'll need Firebug and to make sure that persist is enabled (with this you can see directs, not just data for currently loaded page; use clear to clear console):

http

Third, while Fiddler's disable-cache is great for most days, Firefox throws a bad fix for invalid certs. So, disable cache in Firebug:

http

Now you'll be able to test HSTS in Firefox. Once you can verify that you can see the header and the redirect, you can have certainty of it's removal.

To remove HSTS locally, look for the SiteSecurityServiceState.txt file in your Firefox profile. Where's THAT? I'm not about to remember where it is. I'm saving that brain space for important thing. On Windows, I just run a quick Agent Ransack search (a tool which should be part of your standard install!). You could quickly find it with PowerShell as well:

ls $env:appdata\Mozilla -r site*

On Linux, it's a freebie:

find / -name SiteSecurityServiceState.txt

Once found, you apparently have to exit Firefox to get it to flush to the file. Then, you can edit your domain out of the file.

TOFU

Now to upgrade the security again...

That great, but you still have to trust the website on first access (often called the TOFU problem: trust on first use; remember: no sane person likes tofu). You accessed HTTP first. That may have been hacked. The well may have been poisoned. It's the same hack as in the account management example; it's just moved a little earlier.

The solution actually quite simple: have the fact that you want HTTPS-only baked into the web browsers. Yeah, it's possible.

You do this by upgrading your security again, then submitting your site to https://hstspreload.appspot.com/.

The upgrade requirements are as follows:

  • Have an SSL cert (duh!)
  • Use the SSL cert for the domain and all subdomains (e.g. account., www., etc...)
  • Upgrade the HSTS header to have a max-age of 126 days
  • Upgrade the HSTS header to enforce HSTS for all subdomains
  • Upgrade the HSTS header to enable preloading

The ultimately boils down to this:

http

That's it.

It says 126 days, but https://www.ssllabs.com/ssltest/ gives you a warning for anything under 180. Just do 180.

Just submit your site to https://hstspreload.appspot.com/ and you'll be baked into the browser (well, browsers depending on how much it's shared). It will tell you about any potential errors.

Nobody will see this until updates come through. This is one reason updates and important.

How can you prevent just anyone from submitting you? You can't. By adding preload, you stated that you wanted this. The hstspreload website will make sure you want this before bothers doing anything with it.

ASP.NET WebApi / ASP.NET MVC

How do you add this to my website? You add it the same way you add any header. If you are using ASP.NET MVC or ASP.NET WebApi, you just create a filter.

ASP.NET WebApi:

public class HstsActionFilterAttribute : System.Web.Http.Filters.ActionFilterAttribute
{
    public override void OnActionExecuted(System.Web.Http.Filters.HttpActionExecutedContext actionExecutedContext)
    {
        actionExecutedContext.Response.Headers.Add("Strict-Transport-Security", "max-age=10886400");
    }
}

ASP.MET MVC:

public class HstsActionFilterAttribute : System.Web.Mvc.ActionFilterAttribute
{
    public override void OnResultExecuted(System.Web.Mvc.ResultExecutedContext filterContext)
    {
        filterContext.HttpContext.Response.AppendHeader("Strict-Transport-Security", "max-age=10886400");
        base.OnResultExecuted(filterContext);
    }
}

What do you do if you enforce HTTPS then find out later that the client decided that being hacked is cool and wants to remove HSTS and SSL?

Well, first you get them to sign a releasing stating that you warned them about security and they threw it back in your face.

After that, set max-age to 0 and hope every single person who ever accessed yourself comes back to get the new header. After that, removing the header and SSL. In reality: that's not going to happen. The people who didn't get the max-age=0 header will be locked out until the max-age expires.

NWebSec

Filters for ASP.NET MVC and ASP.NET WebApi are great, but the best type of coding is coding-by-subtraction. Support less. Relax more. To this end, you'll want to whip out the NWebSec nuget package.

Once you add this package, your web.config file will be modified for the initial setup. You just need to add your own configuration.

Here's a typical config I use for my ASP.NET WebAPI servers:

<nwebsec>
  <httpHeaderSecurityModule xmlns="http://nwebsec.com/HttpHeaderSecurityModuleConfig.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="NWebsecConfig/HttpHeaderSecurityModuleConfig.xsd">
    <redirectValidation enabled="false">
      <add allowedDestination="https://mysamplewebsite.azurewebsites.net/" />
    </redirectValidation>
    <securityHttpHeaders>
      <x-Frame-Options policy="Deny" />
      <strict-Transport-Security max-age="126" httpsOnly="true" includeSubdomains="true" preload="true" />
      <x-Content-Type-Options enabled="true" />
      <content-Security-Policy enabled="true">
        <default-src self="true" />
        <script-src self="true" /> 
        <style-src none="true" />
        <img-src none="true" />
        <object-src none="true" />
        <media-src none="true" />
        <frame-src none="true" />
        <font-src none="true" />
        <connect-src none="true" />
        <frame-ancestors none="true" />
        <report-uri>
          <add report-uri="https://report-uri.io/report/6f42f369dd72ec153d55b775ad48aad7/reportOnly" />
        </report-uri>
      </content-Security-Policy>
    </securityHttpHeaders>
  </httpHeaderSecurityModule>
</nwebsec>

The important line for this discussion is the strict-Transport-Security element.

NWebSec works by days, not seconds. Just keep that in mind.

The content-Security-Policy is also critical for solid security, but that's a discussion for a different day. Just keep in mind that the previous example was for an API server. ASP.NET MVC would require you to open access to images, scripts, and other browser-related entities.

Nginx

Lets say you don't want to add this to your website. Wait, why not? Here's one reason: you don't have access to SSL on the box, but you do somewhere else. Azure comes to mind: you have a free-tier website, but want SSL with a custom domain. You'll have to use nginx for SSL termination with Varnish (caching server) to offload a lot of traffic from Azure.

Think it through: varnish is in front of your website providing caching. What about putting something in front of it to provide SSL? Easy. That's called SSL termination. Before I got completly sick of ASP.NET and rewrote everything in Python/Django, I did this for most of my ASP.NET web sites.

In your Nginx config, you can do something like this:

http

As usual, Linux is more elegant than .NET/Windows. Just a fact of life...

Production

When you implement SSL, don't celebrate and advertise the greatness of your security milestone. You only made it to the 1990s. That's nothing to brag about. You need to hit HSTS before you hit something anything close to modern security. If you are doing any law-enforcement or finance work, you need to implement HPKP before you hit the minimum required security to earn your responsible-adult badge.

Even then, if you ever want to ruin your life, use the standard development SDLC for these features: do it in development, give it to QA, throw it on a staging environment, then push to production. You will ruin your entire company. Why? Because you have to slowly implement these features in production.

Here's one route to take: run HSTS through your SDLC (starting with staging) with a 2 second max-age. Yes, it's pointless, but it will tell you if it's working at all. Give it a week. Jump to 5 seconds. Give it two weeks. Jump to 30 seconds. At this point, possibly give it a few months. Slowly increase, slowly add browser-upgrade-recommendations to your web site, and slowly give clients warnings (HSTS is passive and won't ever kick anyone out; but you want them to be secure, so inform them!)

Eventually, you'll get to a 128 model where you can add the preload option and request that your site be baked into the browsers themselves.