2005 2006 2007 2008 2009 2010 2011 2015 2016 aspnet azure csharp debugging exceptions firefox javascriptajax linux llblgen powershell projects python security services silverlight training videos wcf wpf xag xhtmlcss

Generating MongoDB Sample Data

Last year I wrote about generating better random strings.

Lorem Ipsum is the devil. It messes with us who are students of Latin; Cicero is hard enough without people throwing randomized Cicero in our faces. It's better to use something that isn't part of a linguistic insurgency. Use my Hamlet generator instead.

Anyway...

Because MongoDB is a standard component of any modern architecture these days, we need the ability to generate, not simply strings, but full objects for our test databases.

The following MongoDB script will do just that. Change the value of the run function-call to set the number of objects to throw at MongoDB.

You run this with the MongoDB shell:

./mongo < hamlet.js

Note: The third-party tool Robomongo, while awesome for day-to-day usage, will not work for this. It doens't play nicely with initializeUnorderedBulkOp, which you need for bulk data import. It's like the BULK INSERT command in SQL.

You can use the following with abridged data or this with the full hamlet lexicon.

var raw = "o my offence is rank it smells to heaven hath the primal eldest curse upont a brothers murder pray can i not though inclination be as sharp will stronger guilt defeats strong intent and like man double business bound stand in pause where shall first begin both neglect what if this cursed hand were thicker than itself with blood there rain enough sweet heavens wash white snow whereto serves mercy but confront visage of whats prayer two-fold force forestalled ere we come fall or pardond being down then ill look up fault past form serve turn forgive me foul that cannot since am still possessd those effects for which did crown mine own ambition queen may one retain corrupted currents world offences gilded shove by justice oft tis seen wicked prize buys out law so above no shuffling action lies his true nature ourselves compelld even teeth forehead our faults give evidence rests try repentance can yet when repent wretched state bosom black death limed soul struggling free art more engaged help angels make assay bow stubborn knees heart strings steel soft sinews newborn babe all well";
var data = raw.split(" ");

function hamlet(count) {
    return data[parseInt(Math.random() * data.length)] + (count == 1 ? "" : " " + hamlet(count - 1));
}

function randrange(min, max) {
    if(!max) { max = min; min = 1;}
    return Math.floor(Math.random() * (max - min + 1)) + min;
}

function createArray(count, generator) {
    var list = [];
    for(var n=0; n<count; n++) {
        list.push(generator());
    }
    return list;
}

function pad(number){
    return ("0" + number).substr(-2);
}

function createItem() {
    item = {
        "_id": "9780" + randrange(100000000, 999999999),
        "title": hamlet(randrange(4, 8)),
        "authors": createArray(randrange(4), function() { return hamlet(2) }),
        "metadata": {
            "pages": NumberInt(randrange(1, 400)),
            "genre": createArray(randrange(2), function() { return hamlet(1) }),
            "summary": hamlet(randrange(100, 400)),
        },
        "published": new Date(randrange(1960, 2016) + "-" + pad(randrange(12)) + "-" + pad(randrange(28)))
    };

    if (randrange(4) == 1) {
        item.editor = hamlet(1);
    }

    return item;
}

function run(count) {
    var bulk = db.book.initializeUnorderedBulkOp();
    for (var n = 0; n < count; n++) {
        bulk.insert(createItem());
    }
    bulk.execute();
}

run(100000)

Developing Azure Modular ARM Templates

Cloud architectures are nearly uniquitous. Managers are letting go of their FUD and embracing a secure model that can extend their reach globally. IT guys, who don't lose any sleep over the fact that their company's finance data is on the same physical wire as their public data, because the data is separated by VLANs, are realizing that VNets on Azure function on the same principle. Developers are embracing a cross-platform eutopia where Python and .NET can live together as citizens in a harmonious cloud solution. OK... maybe I'm dreaming about that last one, but the cloud is widely used.

With Azure 2.0 (aka Azure ARM), we finally have a model of managing our resources (database, storage account, network card, VM, load balancer, etc) is a declarative model where we can throw nouns at Azure and it let verb it into existence. The JSON templates give us a beautiful 100% GUI-free environment to restore sanity the stolen from us by years of dreadfully clicking buttons. Yet, there's gotta be a better way of dealing with our ARM templates than scrolling up and down all the time. Well, there is...

Below is a link to a template that defines all kinds of awesome:

Baseline ARM Template

Take this magical spell and throw it at Azure and you'll get a full infrastucture of many Elasticsearch nodes, all talking to each other, each with their own endpoint, and a traffic manager to unify the endpoints to make sure everyone in the US gets a fast search connection. There's also the multiple VNets, mesh VPN, and the administative VM and all that stuff.

Yet, this isn't even remotely how I work with my templates. This is:

ARM Components

Synopsis

Before moving on, note that there are a lot of related concepts going on here. It's important that I give you a quick synopsis of what follows:

  • Modularly splitting ARM templates into managable, mergable, reusable JSON files
  • Deploying ARM templates in phases.
  • Proposal for symlinking for reusable architectures
  • Recording production deployments
  • Managing deployment arguments
  • Automating support files

Let's dive in...

Modular Resources

Notice that the above screenshot does not show monolith. Instead, I manage individual resources, not the entire template at once. This let's me find and add, remove, enable, disable, merge, etc things quickly.

Note that each folder represents "resource provider/resource type/resource.json". The root is where you would put the optional sections variables.json, parameters.json, and outputs.json. In this example, I have a PS1 file there just because it supports this particular template.

My deployment PowerShell script combines the appropriate JSON files together to create the final azuredeploy-generated.json file.

I originally started with grunt to handle the merging. grunt-contrib-concat + grunt-json-format worked for a while, but my Gruntfile.js became rather long, and the entire process was wildly unreliable anyway. Besides, it was just one extra moving part that I didn't need. I was already deploying with PowerShell. So, might as well just do that...

You can get my PowerShell Azure modular JSON magical script at the end of this article.

There's a lot to discuss here, but let's review some core benefits...

Core Benefits

Aside from the obvious benefit of modularity to help you sleep at night, there are at least two other core benefits:

First, is the ability to add and remove resources via files, but a much greater benefit is the ability to enable or disable resources. In my merge script, I exclude any file that starts with an underscore. This acts a a simple way to comment out a resource.

Second, is the ability to version and merge individual resources in Git (I'm assuming you're living in 2016 or beyond, there are are using Git, not that one old subversive version control thing or Terrible Foundation Server). The ability to diff and merge resources, not entire JSON monoliths is great.

Phased Deployment

When something is refactored, often fringe benefits naturally appear. In this case, modular JSON resources allows for programmaticly enabling and disabling of resources. More specifically, I'd like to mention a concept I integrate into my deployment model: phased deployment.

When deploying a series of VM and VNets, it's important to make sure your dependencies are setup correctly. That's fairly simple: just make sure dependsOn is setup right in each resource. Azure will take that information into account to see what to deploy in parallel.

That's epic, but I don't really want to wait around forever if part of my dependency tree is a network gateway. Those things take forever to deploy. Not only that, but I've some phases that are simply done in PowerShell.

Go back and look at the screenshot we started with. Notice that some of the resources start with 1., 2., etc.... So, starting a JSON resource with "#." states at what phase that resource will deploy. In my deployment script I'll state what phase I'm currently deploying. I might specify that I only want to deploy phase 1. This will do everything less than phase 1. If I like what I see, I'll deploy phase 2.

In my example, phase 2 is my network gateway phase. After I've aged a bit, I'll come back to run some PowerShell to create a VPN mesh (not something I'd try to declare in JSON). Then, I'll deploy phase 3 to setup my VMs.

Crazy SymLink Idea

This section acts more as an extended sidebar than part of the main idea.

Most benefits of this modular approach are obvious. What might not be obvious is the following:

You can symlink to symbols for reuse. For any local Hyper-V Windows VM I spin up, I usually have a Linux VM to go along with it. For my day-to-day stuff, I have a Linux VM that I for general development which I never turn off. I keep all my templates/Git repos on it.

On any *nix-based system, you can create symbolic links to expose the same file with multiple file names (similar to how myriad Git "filename" will point to the same blob based on a common SHA1 hash).

Don't drift off simply because you think it's some crazy fringe idea.

For this discussion, this can mean the following:

./storage/storageAccounts/storage-copyIndex.json
./network/publicIPAddresses/pip-copyIndex.json
./network/networkInterfaces/nic-copyIndex.json
./network/networkSecurityGroups/nsg-copyIndex.json
./network/virtualNetworks/vnet-copyIndex.json

These resources could be some epic, pristine awesomeness that you want to reuse somewhere. Now, do use the following Bash script:

#!/bin/bash

if [ -z "$1" ]; then
    echo "usage: link_common.sh type"
    exit 1
fi

TYPE=$1

mkdir -p `pwd`/$TYPE/template/resources/storage/storageAccounts
mkdir -p `pwd`/$TYPE/template/resources/network/{publicIPAddresses,networkInterfaces,networkSecurityGroups,virtualNetworks}

ln -sf `pwd`/_common/storage/storageAccounts/storage-copyIndex.json `pwd`/$TYPE/template/resources/storage/storageAccounts/storage-copyIndex.json
ln -sf `pwd`/_common/network/publicIPAddresses/pip-copyIndex.json `pwd`/$TYPE/template/resources/network/publicIPAddresses/pip-copyIndex.json
ln -sf `pwd`/_common/network/networkInterfaces/nic-copyIndex.json `pwd`/$TYPE/template/resources/network/networkInterfaces/nic-copyIndex.json
ln -sf `pwd`/_common/network/networkSecurityGroups/nsg-copyIndex.json `pwd`/$TYPE/template/resources/network/networkSecurityGroups/nsg-copyIndex.json
ln -sf `pwd`/_common/network/virtualNetworks/vnet-copyIndex.json `pwd`/$TYPE/template/resources/network/virtualNetworks/vnet-copyIndex.json

Run this:

chmod +x ./link_common.sh
./link_common.sh myimpressivearchitecture

This will won't create duplicate files, but it will create files that point to the same content. Change one => Change all.

Doing this, you might want to make the source-of-truth files read-only. There are a few days to do this, but the simplest is to give root ownership of the common stuff, then give yourself file-read and directory-list rights.

sudo chown -R root:$USER _common
sudo chmod -R 755 _common 

LINUX NOTE: directory-list rights are set with the directory execute bit

If you need to edit something, you'll have to do it as root (e.g. sudo). This will protect you from doing stupid stuff.

Linux symlinks look like normal files and folders to Windows. There's nothing to worry about there.

This symlinking concept will help you link to already established architectures. You can add/remove symlinks as you need to add/remove resources. This is an established practice in the Linux world. It's very common to create a folder for ./sites-available and ./sites-enabled. You never delete from ./sites-enabled, you simply create links to enable or disable.

Hmm, OK, yes, that is a crazy fringe idea. I don't even do it. Just something you can try on Linux, or on Windows with some sysinternals tools.

Deployment

When you're watching an introductory video or following a hello world example of ARM templates, throwing variables at a template is great, but I'd never do this in production.

In production, you're going to archive each script that is thrown at the server. You might even have a Git repo for each and every server. You're going to stamp everything with files and archive everything you did together. Because this is how you work anyway, it's best to keep that as an axiom and let everything else mold to it.

To jump to the punchline, after I deploy a template twice (perhaps once with gateways disabled, and one with them enabled, to verify in phases), here's what my ./deploy folder looks like:

./09232016-072446.1/arguments-generated.json
./09232016-072446.1/azuredeploy-generated.json
./09232016-072446.1/success.txt
./09242016-051529.2/arguments-generated.json
./09242016-051529.2/azuredeploy-generated.json
./09242016-051529.2/success.txt

Each deployment archives the generated files with the timestamp. Not a while lot to talk about there.

Let's back up a little bit and talk about deal with arguments and that arguments-generated.json listed above.

If I'm doing phased deployment, the phase will be suffixed to the deploy folder name (e.g. 09242016-051529.1).

Deployment Arguments

Instead of setting up parameters in the traditional ARM manner, I opt to generate an arguments file. So, my model is to not only generate the "azuredeploy.json", but also the "azuredeploy-parameters.json". Once these are generated, they can be stamped with a timestamp, then archived with the status.

Sure, zip them and throw them on a blob store if you want. Meh. I find it a bit overkill and old school. If anything, I'll throw my templates at my Elasticsearch cluster so I can view the archives that way.

While my azuredeploy-generated.json is generated from myriad JSON files, my arguments-generated.json is generated from my ./template/arguments.json file.

Here's my ./template/arguments.json file:

{
    "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "admin-username": {
            "value": "{{admin-username}}"
        },
        "script-base": {
            "value": "{{blobpath}}/"
        },
        "ssh-public-key": {
            "value": "{{ssh-public-key}}"
        }
    }
}

My deployment script will add in the variables to generate the final arguments file.

$arguments = @{
    "blobpath" = $blobPath
    "admin-username" = "dbetz"
    "ssh-public-key" = (cat $sshPublicKeyPath -raw)
}

Aside from the benefits of automating the public key creation for Linux, there's that blobpath argument. That's important. In fact, dynamic arguments like this might not even make sense until you see my support file model.

Support Files

If you are going to upload assets/scripts/whatever to your server during deployment, you need to get them to a place they are accessible. One way to do this is to commit to Git every 12 seconds. Another way is to simply use blob storage.

Here's the idea:

You have the following folder structure:

./template
./support

You saw ./template in VS Code above, in this example, ./support looks like this:

support/install.sh
support/create_data_generation_setup.sh
[support/generate/hamlet.py](https://netfxharmonics.com/n/2015/03/brstrings)

These are files that I need to get on the server. Use Git if you want, but Azure can handle this directly:

$key = (Get-AzureRmStorageAccountKey -ResourceGroupName $deploymentrg -Name $deploymentaccount)[0].value
$ctx = New-AzureStorageContext -StorageAccountName $deploymentaccount -StorageAccountKey $key
$blobPath = Join-Path $templatename $ts
$supportPath = (Join-Path $projectFolder "support")
(ls -File -Recurse $supportPath).foreach({
    $relativePath = $_.fullname.substring($supportPath.length + 1)
    $blob = Join-Path $blobPath $relativePath
    Write-Host "Uploading $blob"
    Set-AzureStorageBlobContent -File $_.fullname -Container 'support' -Blob $blob -BlobType Block -Context $ctx -Force > $null
})

This PowerShell code in my ./support folder and replicates the structure to blob storage.

You ask: "what blob storage?"

Response: I keep a resource group named deploy01 around with a storage account named file (with 8 random letters to make it unique). I reuse this account for all my Azure deployments. You might duplicate this per client. Upon deployment, blobs are loaded with the fully qualified file path including the template that I'm using and my deployment timestamp.

The result is that by time the ARM template is thrown at Azure, the following URL was generated and the files are in place to be used:

https://files0908bf7n.blob.core.windows.net/support/elasticsearch-secure-nodes/09232016-052804 

For each deployment, I'm going to have a different set of files in blob storage.

In this case, the following blobs were uploaded:

elasticsearch-secure-nodes/09232016-072446/generate/hamlet.py                                
elasticsearch-secure-nodes/09232016-072446/install.sh                                        
elasticsearch-secure-nodes/09232016-072446/create_data_generation_setup.sh 

SECURITY NOTE: For anything sensitive, disable public access, create a SAS token policy, and use that policy to generate a SAS token URL. Give this a few hours to live so your entire template can successfully complete. Remember, gateways take a while to create. Once again: this is why I do phased deployments.

When the arguments-generated.json is used, the script-base parameter is populated like this:

"setup-script": {
    "value": "https://files0c0a8f6c.blob.core.windows.net/support/elasticsearch-secure-nodes/09232016-072446"
},

You can then use this parameter to do things like this in your VM extensions:

"fileUris": [
    "[concat(parameters('script-base'), '/install.sh')]"
],
"commandToExecute": "[concat('sh install.sh ', length(variables('locations')), ' ''', parameters('script-base'), ''' ', variables('names')[copyindex()])]"

Notice that https://files0908bf7n.blob.core.windows.net/support/elasticsearch-secure-nodes/09232016-072446/install.sh is the script to be called, but https://files0908bf7n.blob.core.windows.net/support/elasticsearch-secure-nodes/09232016-072446 is also sends in as a parameter. This will tell the script itself where to pull the other files. Actually, in this case, that endpoint is passed a few levels deep.

In my script, when I'm doing phased deployment, I can set uploadSupportFilesAtPhase to whatever phase I want to upload support files. I generally don't do this at phase 1, because, for mat, that phase is everything up to the VM or gateway. The support files are for the VMs, so there's no need to play around with them while doing idempotent updates to phase 1.

Visual Studio Code

I've a lot of different editors that I use. Yeah, sure, there's Visual Studio, whatever. For me, it's .NET only. It's far too bulky for most anything else. For ARM templates, it's absolutely terrible. I feel like I'm playing with VB6 with it's GUI driven resource seeking.

While I use EditPlus or Notepad2 (scintilla) for most everything, this specific scenario calls for Visual Studio Code (Atom). It allows you to open a folder directly without the needs for pointless SLN files and lets you view the entire hierarchy at once. It also lets you quickly CTRL-C/CTRL-V a JSON file to create a new one (File->New can die). F2 also works for rename. Not much else you need in life.

Splitting a Monolith

Going from an exist monolithic template is simple. Just write a quick tool to open JSON and dump it in to various files. Below is my a subpar script I wrote in PowerShell to make this happen:

$templateBase = '\\10.1.40.1\dbetz\azure\armtemplates'
$template = 'python-uwsgi-nginx'
$templateFile = Join-Path $templateBase "$template\azuredeploy.json"
$json = cat $templateFile -raw
$partFolder = 'E:\Drive\Code\Azure\Templates\_parts'
$counters = @{ "type"=0 }

((ConvertFrom-Json $json).resources).foreach({
    $index = $_.type.indexof('/')
    $resourceProvider = $_.type.substring(0, $index).split('.')[1].tolower()
    $resourceType = $_.type.substring($index+ 1, $_.type.length - $index - 1)
    
    $folder = Join-Path $partFolder $resourceProvider
    if(!(Test-Path $folder)) {
        mkdir $folder > $null
    }

    $netResourceType = $resourceType
    while($resourceType.contains('/')) {    
        $index = $resourceType.indexof('/')
        $parentResourceType = $resourceType.substring(0, $index)
        $resourceType = $resourceType.substring($index+ 1, $resourceType.length - $index - 1)
        $netResourceType = $resourceType
        $folder = Join-Path $folder $parentResourceType
        if(!(Test-Path $folder)) {
            mkdir $folder > $null
        }
    }
    $folder = Join-Path $folder $netResourceType
    if(!(Test-Path $folder)) {
        mkdir $folder > $null
    }
    
    $counters[$_.type] = $counters[$_.type] + 1
    $file = $folder + "\" + $netResourceType + $counters[$_.type] + '.json'
    Write-Host "saving to $file"
    (ConvertTo-Json -Depth 100 $_ -Verbose).Replace('\u0027', '''') | sc $file
})

Here's a Python tool I wrote that does the same thing, but the JSON formatting is much better: https://jampadcdn01.azureedge.net/netfx/2016/09/modulararm/armtemplatesplit.py

This is compatible with Python 3 and legacy Python (2.7+).

Deploy script

Here's my current deploy armdeploy.ps1 script:

Deploy ARM Template (ps1)

LCSA Exam Tips

If you're going for the Linux on Azure certification, you'll be taking the LCSA exam. This is a practicum, so you're going to be going through a series of requirements that you'll have to implement.

Ignore the naive folk who say that this is a real exam, "unlike those that simply require to you memorize a bunch of stuff". The problem today is NOT with people having too much book knowledge, but nowhere near enough. Good for you for going for a practicum exam. Now study for the cross-distro LPIC exams so you don't get tunnel vision in your own little world. Those exams will expand your horizons into areas you may not have known about. Remember: it's easy to fool yourself into thinking that you have skill when you do the same thing every day. Perhaps your LCSA exam will simply match your day job and you'll think you're simply hardcore. Go study the books and get your humility with the written exams. You know, the ones where you can't simply type man every time you have a problem.

Here are some tips:

noclobber

In whatever account (e.g. some user or root) you'll be using, set the following:

set -o noclobber

This will make it so that if you accidentally use > instead of >>, you won't automatically fail. It will simply prevent overwriting some critical file.

For example, if you're trying to send a new line to /etc/fstab, you may want to do the following:

echo `blkid | grep sdb1`    /mng/taco    xfs    defaults    0 0 >> /etc/fstab

Yeah, you'll need to edit the UUID format after that, but regardless: the >> means append. If you accidentally use >, you're dead. You've already failed the exam. The system needs to boot. No fstab => no boot.

/root/notes

The requirements you'll be given will, like with all requirements, they require careful thought and interpretation. Your thought process at may be more clear later. This is why giving estimates on the job RIGHT THEN AND THERE is what we technically call "full retard". In this exam, you've got 2 hours. It's best not to sit there and dwell. Go onto other things and come back.

In this process, you'll want to store your mental process, but you can't write anything down and there's also no in-exam notepad. So, I like to throw notes in some random file.

echo review ulimits again >> /root/notes

Again, if you accidentally used > instead of >>, you'd overwrite your notes.

The forward/backward buttons in the exams are truly horrible. It's like trying to find a scene in a VHS. There's no seek, you have to scroll all the way through the questions to get back to what you want to see. Just write the question in your notes.

Using command comments

If you're in the middle of a command and you realize "uhhh.... I have no idea how to do this next part", you may want to hit the manpages.

BUT! What about the line you're currently on? Just throw # in front of it and hit enter. It will go into your history so you can pick it up later to finish.

Example:

$ #chmod 02660 /etc/s

You can't remember the rest of the path, do you have to look it up. CTRL-A to the begining of the line and put #. Hitting enter will put it into history without an error.

Consider using sed

Your initial thought may be "hmm vi!" That's usually a good bet for any question, but you better know sed. With sed you can do something like this:

sed "/^#/d" /etc/frank/data.txt 

This will delete all lines that start with #, but it will only send the output to YOU, it won't actually update the file.

If you like what you see, do this:

sed -i.original "/^#/d" /etc/frank/data.txt

This will update the file, but save the original to data.txt.original

Remember the absolute basics of awk

Always remember at least the following, this single command accounts for the 80/20 of all my awk usage:

awk '{print $2}'

Example:

ls -l /etc/hosts* | awk '{ print $9 }'

Output:

/etc/hosts
/etc/hosts.allow
/etc/hosts.deny

Probably the worst example in the world, but the point is: you get the 9th column of the output. Pipe that to whatever and move on.

Need two columns?

ls -l /etc/hosts* | awk '{ print $5, $9 }'

Output:

338 /etc/hosts
370 /etc/hosts.allow
460 /etc/hosts.den

Know vi

If you don't know vi, you aren't going to even take the LFCS exam. It's a moot point. The shortcuts, text editing capabilities, and absolute ubiquity of vi makes it something you do not have the option to ignore.

Know find

Believe me, the following command pattern will save you in the exam and on the job:

find . -name "*.txt" -exec cp {} /tmp\;

The stuff after -exec runs once per file found. Instead of doing stuff one-at-a-time, you can use find to process a bunch of stuff at once. The {} represents the file. So, this is copy each file found to /tmp.

Know head / tail (and all the other standard tools!)

If you're like me, when you're in a test, you could read "How many moons does Earth have?" and you'll quickly doubt yourself. Aside from the fact that Steven Fry on QI sometimes says Earth has two moons, and other times says Earth has none, the point is that it's easy to forget everything you think is obvious.

So, if you need to do something with certain files in a list and can't remember for the life of you how you to deal with files 90-130, perhaps you do this:

ls -l | head -n130 | tail -n40

Then do some type of awk to get the file name and do whatever.

That's one thing that's easy about this exam: you can forget your own name and still stumble through it since only the end result is graded.

Know for

I love for. I use this constantly. For example:

for n in `seq 10`; do touch $n; done

That just made 10 files.

Need to create 10 5MB files?

for n in `seq 10`; do dd if=/dev/zero of=/tmp/$n.img bs=1M count=5; done

I love using that to test synchronization.

Know how to login as other users

If you're asked to do something with rights, you'll probably want to jump over to that user to test what you've done.

su - dbetz

As root, that will get me into the dbetz account. Once in there I can make sure the rights I supposedly assigned are applied properly.

Not only that, but if I'm playing with sudo, I can go from root to dbetz then try to sudo from there.

Know how to figure stuff out

Obviously, there's the man pages. I'm convinced these exist primarily so random complete jerks on the Internet can tell you to read the manual. They are often a great reference, but they are just that: a reference. Nobody sits down and reads them like a novel. Often, they are so cryptic, that you're stuck.

So, have alternate ways of figuring stuff out. For example, you might want to know where other docs are:

find / -name "*share*" | grep docs

That's a wildly hacky way to find some docs, but that's the point. Just start throwing searches out there.

You'll also want to remember the mere existence of various files. For example, /etc/bashrc and /etc/profile. Which calls which? Just open them and look. This isn't a written test where you have to actually know this stuff. The system itself makes it an open book exam. For the LPIC exams, you need to know this upfront.

Running with Nginx

Stacks don't exist. As soon as you change your database you're no longer LAMP or MEAN. Drop the term. Even then, the term only applies to technology; it doesn't describe you. If you are a "Windows guy", learn Linux. If you are a "LAMP" dude, you have to at least have some clue about .NET. Don't marry yourself to only AWS or Azure. Learn both. Use both. Some features of Azure make me drool, while others remind me of VB6 (update: Azure ARM is perfectly solid, modern, and awesome!) Some features of AWS make me feel like a kid in a candy store, while others make me wonder if they are actually April Fool's jokes.

Regardless of whatever you're into, you really should learn this epic tool called Nginx. I've been using it for a while and now have almost all my web sites touching it in some way.

So, what is it? The marketing says it's a "reverse proxy". While I used to like this term, it's become fodder for mockery. No, Nginx is a web server. It's serves content for the web. Sometimes it gets the content from a file systems, at other times it gets it from HTTP. Regardless, it's a web server.

In the old days of web servers, your web server would handle BOTH the processing of the content AND the HTTP serving. It did too much. As much as IIS7 was an improvement over IIS6 (no more ISAPI), it still suffers from this. It's trying both to run .NET and to serve the content.

Modern web servers handle things differently: UWSGI runs Python, PM2 runs Node, and Kestrel runs .NET Core. In front of this is Nginx handling the HTTP traffic and dealing with all the SSL certs. The days of having to deal with both IIS and Apache are gone. Python, Node, and .NET Core each know how to run their own code and Nginx knows HTTP. The concepts are separate, now the processes are separate.

Adding SSL and Authentication

I'm going to start off with a classic example: adding SSL and username / password authentication to an existing web API.

Elasticsearch is one of my favorite database systems; yet, it doesn't have native support for SSL or authorization. There's a tool called Shield for that, but it's over kill when I don't care about multiple users. Nginx came to the rescue. Below is my basic Nginx config. You should be able to look at the following config to get an idea of what's going on. Of course, I'll add some commentary.

server {
    listen 10.1.60.3;

    auth_basic "ElasticSearch";
    auth_basic_user_file /etc/nginx/es-password;

    location / {
        proxy_pass http://127.0.0.1:9200;
        proxy_http_version 1.1;
        proxy_set_header Connection "Keep-Alive";
        proxy_set_header Proxy-Connection "Keep-Alive";
    }
}

In this example, I have a listener setup to listen on port 443. In the context of this listener, I'm setting configuration for /. I'm passing all traffic on to port 9200. This port is only bound locally, so HTTP isn't even publicly accessible. You can also see I'm setting some optional headers.

443 is SSL, so I have my SSL cert and SSL key configured (in my real config, there's a lot more SSL config; just stuff to configure the ciphers).

Finally, you can see that I've setup basic user authentication. Prior to creating this config I used the Apache htaccess command to create a password file:

sudo htpasswd -c /srv/es-htpasswd searchuser

If you stare at the config enough, it will be demystified. Nginx is simply adding SSL and username/password auth to an existing working, open HTTP-only server.

SSL Redirect

Let's lighten up a bit with a simpler example...

There are myriad ways to redirect from HTTP to HTTPS. Nginx is my new favorite way:

server {
    listen 222.222.222.222:80;

    server_name mydomain.net;
    server_name www.mydomain.net;

    return 301 https://mydomain.net$request_uri;
}

Accessing localhost only services

The other day I needed to download some files from my Google Drive to my Linux Server. rclone seemed to be an OK way to do that. During setup, it wanted me to go through the OpenID/OAuth stuff to give it access. Good stuff, but...

If your browser doesn't open automatically go to the following link: http://127.0.0.1:53682/auth
Log in and authorize rclone for access
Waiting for code...
Got code

Uhh... 127.0.0.1? Dude, that's a remote server. I tried to go there with the text-based Lynx browser, but, wow... THAT. WAS. HORRIBLE. Then I had a random realization: Nginx! Here's what I did real quick:

server {
    listen 111.111.111.111:8080;

    location / {
        proxy_pass http://127.0.0.1:53682;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header Host 127.0.0.1;
    }
}

Then I could access the server in my own browser using http://111.111.111.111:53682/auth.

BOOM! I got the Google authorization screen right away and everything came together.

Making services local only

This brings up an intersting point: what if you had a public service you didn't want to be public, but didn't have a way to do it-- or, perhaps, you just wanted to change the port?

In a situation where I had to cheat, I'd cheat by telling iptables (Linux firewall) to block that port, then use Nginx to open the new one.

For example:

iptables -A INPUT -i lo -j ACCEPT
iptables -A INPUT -p tcp --dport 8080 -j ACCEPT
iptables -A INPUT -j DROP

This says: allow localhost and stuff to port 8080, but block everything else.

If you do this, you need to save the rules using something like iptables-save > /etc/iptables/rules.v4. On Ubuntu, you can get this via apt-get install iptables-persistent.

Then, you can do something like the previous Nginx example to take traffic from a different port.

Better yet, use firewall-d. iptables is as old and obsolete as Apache.

File Serving

My new architecture for my websites involves a few components: public Azure Blob Storage for my assets, ASP.NET WebAPI for all backend processing, and Python/Django for all my websites (+Elasticsearch for queries and Redis is always preloaded with a full mirror of my database) . My netfxharmonics.com follows this exact architecture. I don't like my websites existing in the same world as anything that serves content to it. The architecture I've promoted for years finally has a name: microservices (thank goodness for a non-lame name! *cough* AJAX *cough*) I take a clean architectural approach: no assets on my website, no database access on my website, and no backend processing on my website. My websites only displays content. Databases (thus all connection strings) are behind my WebAPI wall.

OK... I said no assets. That's not completely true, which brings us to the point: how do I serve robots.txt and favicon.ico if I don't allow local assets? Answer: Nginx.

location /robots.txt {
    alias /srv/robots.txt;
}

location /favicon.ico {
    alias /srv/netfx/netfxdjango/static/favicon.ico;
}

Azure

So, you've got a free/shared Azure Web App. You've got you free hosting, free subdomain, and even free SSL. Now you want your own domain and your own SSL. What do you do? Throw money at it? Uh... no. Well, assuming you were proactive and keep a Linux server around.

This is actually a true story of how I run some of my websites. You only get so much free bandwidth and computing with the free Azure Web Apps, so you have to be careful. The trick to being careful is Varnish.

The marketing for Varnish says it's a caching server. As with all marketing, they're trying to make something sound less cool than it really it (though that's never their goal). Varnish can be a load-balancer or something to handle fail-over as well. In this case, yeah, it's a caching server.

Basically: I tell Varnish to listen to port 8080 on localhost. It will take traffic and provide responses. If it needs something, it will go back to the source server to get the content. Most hits to the server will be handled with Varnish. Azure breathe easy.

Because the Varnish config is rather verbose and because it's only tangentially related to this topic, I really don't want to dump a huge Varnish config here. So, I'll give snippets:

backend mydomain {
    .host = "mydomain.azurewebsites.net";
    .port = "80";
    .probe = {
         .interval = 300s;
         .timeout = 60 s;
         .window = 5;
         .threshold = 3;
    }
  .connect_timeout = 50s;
  .first_byte_timeout = 100s;
  .between_bytes_timeout = 100s;
}

sub vcl_recv {
    #++ more here
    if (req.http.host == "123.123.123.123" || req.http.host == "www.mydomain.net" || req.http.host == "mydomain.net") {
        set req.http.host = "mydomain.azurewebsites.net";
        set req.backend = mydomain;
        return (lookup);
    }
    #++ more here
}

This won't make much sense without the Nginx piece:

server {
        listen 123.123.123.123:443 ssl;

        server_name mydomain.net;
        server_name www.mydomain.net;
        ssl_certificate /srv/cert/mydomain.net.crt;
        ssl_certificate_key /srv/cert/mydomain.net.key;

        location / {
            proxy_pass http://127.0.0.1:8080;
            proxy_set_header X-Real-IP  $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto https;
            proxy_set_header X-Forwarded-Port 443;
            proxy_set_header Host mydomain.azurewebsites.net;
        }
}

Here's what to look for in this:

proxy_set_header Host mydomain.azurewebsites.net

Nginx sets up a listener for SSL on the public IP. It will send requests to localhost:8080.

On the way, it will make sure the Host header says "mydomain.azurewebsites.net". This does two things:

* First, Varnish will be able to detect that and send it to the proper backend configuration (above it).

* Second, Azure will give you a website based on the `Host` header. That needs to be right. That one line is the difference between getting your correct website or getting the standard Azure website template.

In this example, Varnish is checking the host because Varnish is handling multiple IP addresses, multiple hosts, and caching for multiple Azure websites. If you have only one, then these Varnish checks are superfluous.

Verb Filter

Back to Elasticsearch...

It uses various HTTP verbs to get the job done. You can POST, PUT, and to insert, update, or delete respectively, or you can use GET to do your searches. How about a security model where I only allow searches?

It might be a poorman's method, but it works:

server {
    listen 222.222.222.222:80;

    location / {
        limit_except GET {
            deny all;
        }
        proxy_pass http://127.0.0.1:9200;
        proxy_http_version 1.1;
        proxy_set_header Connection "Keep-Alive";
        proxy_set_header Proxy-Connection "Keep-Alive";
    }
}

Verb Filter (advanced)

When using Elasticsearch, you have the option of accessing your data directly without the need for a server-side anything. In face, your AngularJS (or whatever) applications can get data directly from ES. How? It's just an HTTP endpoint.

But, what about updating data? Surely you need some type of .NET/Python bridge to handle security, right? Nah.

Checkout the following location blocks:

location ~ /_count {
    proxy_pass http://elastic;
    proxy_http_version 1.1;
    proxy_set_header Connection "Keep-Alive";
    proxy_set_header Proxy-Connection "Keep-Alive";
}

location ~ /_search {
    proxy_pass http://elastic;
    proxy_http_version 1.1;
    proxy_set_header Connection "Keep-Alive";
    proxy_set_header Proxy-Connection "Keep-Alive";
}

location ~ /_ {
    limit_except OPTIONS {
        auth_basic "Restricted Access";
        auth_basic_user_file /srv/es-password;
    }

    proxy_pass http://elastic;
    proxy_http_version 1.1;
    proxy_set_header Connection "Keep-Alive";
    proxy_set_header Proxy-Connection "Keep-Alive";
}

location / {
    limit_except GET HEAD {
        auth_basic "Restricted Access";
        auth_basic_user_file /srv/es-password;
    }

    proxy_pass http://elastic;
    proxy_http_version 1.1;
    proxy_set_header Connection "Keep-Alive";
    proxy_set_header Proxy-Connection "Keep-Alive";
}

Here I'm saying: you can access anything with _count (this is how you get counts from ES), and anything with _search (this is how you query), but if you are accessing something else containing an underscore, you need to provide creds (unless it's an OPTION, which allows CORS to work). Finally, if you're accessing / directly, you can send GET and HEAD, but you need creds to do a POST, PUT, or DELETE.

You can add credential handling to your AngularJS/JavaScript application by sending creds via https://username:password@mydomain.net.

It works fine. Now you can throw away all your server-side code an stick with raw AngularJS (or whatever). If something requires a preprocessor, postprocessor, or server-side code at all (e.g. couldn't be developed in jdfiddle/plunkr directly), it's not web development (and you might not be a web developer). Here, you have solid, direct web development without the middle-man. Just the browser and the server-infrastructure. It's SPA with your own IAAS setup.

Domain Unification

In the previous example, we have an Elasticsearch service. What about our website? Do we really want to deal with both domain.com and search.domain.com, and the resulting CORS nonsense? Do really REALLY want to deal with multiple SSL certs?

No, we don't.

In this case, you can use Nginx to unify your infrastructure to use one domain.

Let's just update the / in the previous example:

location / {
    limit_except GET HEAD {
        auth_basic "Restricted Access";
        auth_basic_user_file /srv/es-password;
    }

    proxy_pass http://myotherwebendpoint;
    proxy_http_version 1.1;
    proxy_set_header Connection "Keep-Alive";
    proxy_set_header Proxy-Connection "Keep-Alive";
}

Now / uses gets its content from a different place than the other servers.

Let's really bring it home:

location /api {
    proxy_pass http://myserviceendpoint;
    proxy_http_version 1.1;
    proxy_set_header Connection "Keep-Alive";
    proxy_set_header Proxy-Connection "Keep-Alive";
}

Now /api points to your API service.

Now you only have to deal with domain.com while having three different services / servers internally.

Killing 1990s "www."

Nobody types "www.", it's never on business cards, nobody says it, and most people forgot it exists. Why? This isn't 1997. The most important part of getting a pretty URL is removing this nonsense. Nginx to the rescue:

server {
    listen 222.222.222.222:80

    server_name mydomain.net;
    server_name www.mydomain.net;

    return 301 https://mydomain.net$request_uri;
}

server {
    listen 222.222.222.222:443 ssl http2;

    server_name www.mydomain.net;

    # ... ssl stuff here ...

    return 301 https://mydomain.net$request_uri;
}

server {
    listen 222.222.222.222:443 ssl http2;

    server_name mydomain.net;

    # ... handle here ...
}

All three server blocks listen on the same IP, but the first listens on port 80 to redirect to the actual domain (there's no such thing as a "naked domain"-- it's just the domain; "www." is a subdomain), the second listens for the "www." subdomain on the HTTPS port (in this case using HTTP2), and the third is where everyone is being directed.

SSL

This example simply expands the previous one by showing the actual SSL implemenation. Keep in mind that to use HTTP2, you have to have at least Nginx 1.9 (at the time of writing, this meant compiling it yourself-- not a big deal).

server {
    listen 222.222.222.222:80;

    server_name mydomain.net;
    server_name www.mydomain.net;

    return 301 https://mydomain.net$request_uri;
}

server {
    listen 222.222.222.222:443 ssl http2;

    server_name www.mydomain.net;

    ssl_certificate /srv/_cert/mydomain/mydomain.net.chained.crt;
    ssl_certificate_key /srv/_cert/mydomain/mydomain.net.key;

    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;

    ssl_ciphers 'ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA';

    ssl_prefer_server_ciphers on;

    ssl_dhparam /srv/_cert/dhparam.pem;

    return 301 https://mydomain.net$request_uri;
}

server {
    listen 222.222.222.222:443 ssl http2;

    server_name mydomain.net;

    ssl_certificate /srv/_cert/mydomain/mydomain.net.chained.crt;
    ssl_certificate_key /srv/_cert/mydomain/mydomain.net.key;

    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;

    ssl_ciphers 'ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA';

    ssl_prefer_server_ciphers on;

    ssl_dhparam /srv/_cert/dhparam.pem;

    location / {
        add_header Strict-Transport-Security max-age=15552000;
        add_header Content-Security-Policy "default-src 'none'; font-src fonts.gstatic.com; frame-src accounts.google.com apis.google.com platform.twitter.com; img-src syndication.twitter.com bible.logos.com www.google-analytics.com 'self'; script-src api.reftagger.com apis.google.com platform.twitter.com 'self' 'unsafe-eval' 'unsafe-inline' www.google.com www.google-analytics.com; style-src fonts.googleapis.com 'self' 'unsafe-inline' www.google.com ajax.googleapis.com; connect-src search.jampad.net jampadcdn.blob.core.windows.net mydomain.net";

        include         uwsgi_params;
        uwsgi_pass      unix:///srv/mydomain/mydomaindjango/content.sock;

        proxy_set_header X-Real-IP  $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto https;
        proxy_set_header X-Forwarded-Port 443;
        proxy_set_header Host mydomain.net;
    }
}

The certs that I use require the chain certs to get a solid A rating on ssllabs.com, this is just a matter of merging your cert with the chain cert (just Google it):

cat mydomain.net.crt ../positivessl.ca-bundle > mydomain.net.chained.crt

Verb Routing

Speaking of verbs, you could whip out a pretty cool CQRS infrastructure by splitting GET from POST.

This is more of a play-along than a visual-aide. You can actually try this one at home.

Here's a demo using a quick node server:

http = require('http');
port = parseInt(process.argv[2]);
server = http.createServer( function(req, res) {
    res.writeHead(200, {'Content-Type': 'text/html'});
    res.end(req.method + ' server ' + port);
});
host = '127.0.0.1';
server.listen(port, host);

Here's our nginx config:

server {
    listen 222.222.222.222:8192;

    location / {
        limit_except GET {
            proxy_pass http://127.0.0.1:6001;
        }
        proxy_pass http://127.0.0.1:6002;
    }
}

use nginx -s reload to quickly reload config without doing a full service restart

Now, to spin up two of them:

node server.js 6001 &
node server.js 6002 &

& runs something as a background process

Now to call them (PowerShell and curl examples provided)...

(wget -method Post http://192.157.251.122:8192/).content

curl -XPOST http://192.157.251.122:8192/

Output:

POST server 6001
(wget -method Get http://192.157.251.122:8192/).content

curl -XGET http://192.157.251.122:8192/

Output:

GET server 6002

Cancel background tasks with fg then CTRL-C. Do this twice to kill both servers.

There we go, your inserts go to one location you read from a different one.

Development Environments

Another great thing about Nginx is that it's not Apache ("a patchy" web server, as the joke goes). Aside from Apache simply trying to do far too much, it's an obsolete product from the 90s that needs to be dropped. It's also often very hard to setup. The security permissions in Apache, for example, make no sense and the documentation is horrible.

Setting up Apache is a dev environment almost never happens, but Nginx is seamless enough for it not to interfere with day-to-day development.

The point: don't be afraid to use Nginx in your development setup.

Raw Python HTTP Processing

Python HTTP processing (it's not "web development" unless there's a web-browser) is all about WSGI: web software gateway interface. It's a pointless term, but the implementation is beautiful: it's a single interface that handles everything web-related for Python. The signature is as follows (with an example):

def name_does_not_matter(environment, response_code):
    response_code = '200 OK'
    return 'Your content type was {}'.format(environment['CONTENT_TYPE'])

This is even what Django does deep down.

You can use a service like UWSGI to do the processing for this. Like other things in Linux, this tool does one thing, does it well, and relies on other tools for other things. In the case of hosting, Nginx is a solid way to handle the HTTP hosting for UWSGI.

In addition to the config for UWSGI (not shown-- not relevant!), you have the following Nginx config:

server {
    listen 222.222.222.222:80;

    location / {
        include            uwsgi_params;
        uwsgi_pass         unix:/srv/raw_python_awesomeness/content/content.sock;

        proxy_redirect     off;
        proxy_set_header   Host $host;
        proxy_set_header   X-Real-IP $remote_addr;
        proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header   X-Forwarded-Host $server_name;
    }
}

You could make UWSGI serve up something on localhost:8081 (or whatever port you want), but it's best to use sockets where you can.

You can see my WebAPI for Python project at https://github.com/davidbetz/pywebapi for a fuller example.

Bulk Download in Linux

Want to download a huge list of files on Linux? No problem...

Let's get a sample file list:

wget http://www.linuxfromscratch.org/lfs/view/stable/wget-list

Now, let's download them all:

sed 's/^/wget /e' wget-list

This says: execute wget for each line

Done.

Powered by
Python / Django / Elasticsearch / Azure / Nginx / CentOS 7

Mini-icons are part of the Silk Icons set of icons at famfamfam.com