debugging firefox videos csharp linux llblgen wcf 2015 javascriptajax 2016 2011 2010 xag xhtmlcss python training services wpf 2008 projects silverlight 2006 2007 2005 exceptions security 2009 powershell aspnet

Bulk Download in Linux

Want to download a huge list of files on Linux? No problem...

Let's get a sample file list:

wget http://www.linuxfromscratch.org/lfs/view/stable/wget-list

Now, let's download them all:

sed 's/^/wget /e' wget-list

This says: execute wget for each line

Done.

Learning Elasticsearch with PowerShell

I'm not big on marketing introductions. They are usually filled with non-technical pseudo-truths and gibberish worthy of the "As seen on TV" warning label. So, what is Elasticsearch ("ES")? Marketing says it's a search system. People who have used it say it's a hardcore Aristotelian database that makes for a fine primary datastore as well as for a fine search engine.

It's comparable to MongoDB in the sense that you throw JSON objects a it. One of the major differences with MongoDB is that Elastic is more explicit about its indexing. Every database does indexing and everything has a schema. Have a data-structure? You have an index. Have fields? At a minimum, you have an implicit schema. This is what makes an Aristotelian-system Aristotelian.

See my video on Platonic and Aristotelian Data Philosophies for more information on why "NoSQL" is a modern marketing fiction similar to "AJAX".

Another major difference is that Elastic scales much better than MongoDB (see here for more details). This makes sense given the index-focus of Elastic and the fact that it's meant for large-scale searching.

You might find people say that Elastic is schemaless. These people have neither read nor peeked at the docs. Elastic is very explicit about it's indexes. Sometimes you'll hear that it's schemaless because it uses Lucene, which is schemaless. That's stupid. Lucene uses straight bytes and Elastic adds the schema on top of it. Your file system uses bytes and SQL Server adds a schema on top of it. Just because your file system uses bytes, not a schema, this doesn't mean that SQL Server doesn't have a schema because it uses MDF files on a a file system using bytes. SQL Server has a schema. Elastic has a schema. Everything has a schema. It might be implicit, but it has a schema. Even if you never create one, there is a schema. Shutting the hood of your car doesn't mean the engine doesn't exist. You need to think beyond the capacities of a 1-year old. Elastic is explicit about having a schema.

My game plan here will seem absolutely backward:

  • Abstract some lower-level ES functionality with PowerShell using search as an example.
  • Discuss ES theory, index setup, and data inserting.

Given that setup is a one-time thing and day-to-day stuff is... uh... daily, the day-to-day stuff comes first. The first-things-first fallacy can die.

As I demonstrating using ES from PowerShell I will give commentary on what I'm doing with ES. You should be able to learn both ES and some practical, advanced PowerShell.

There's a lot of really horrible PowerShell out there. I'm not part of the VB-style / Cmdlet / horribly-tedious-and-tiring-long-command-name PowerShell weirdness. If you insist on writing out Invoke-WebRequest instead of simply using wget, but use int and long instead of Int32 and Int64, then you have a ridiculous inconsistency to work out. Also, you're part of the reason PowerShell isn't more widespread.. You are making it difficult on everyone . In the following code, we're going to use PowerShell that won't make you hate PowerShell; it will be pleasant and the commands will ends up rolling off your fingers (something that won't happen with the horrible longhand command names).

Flaw Workaround

Before we do anything, we have to talk about one of the greatest epic fails in recent software history...

While the Elasticsearch architecture is truly epic, it has a known design flaw (the absolute worst form of a bug): it allows a POST body in a GET request. This makes development painful:

  • Fiddler throws a huge red box at you.
  • wget in PowerShell gives you an error.
  • Postman in Chrome doesn't even try.
  • HttpClient in .NET throws System.Net.ProtocolViolationException saying "Cannot send a content-body with this verb-type."

.NET is right. Elasticsearch is wrong. Breaking the rules for the sake of what you personally feel makes more sense makes you a vigilante. Forcing a square peg into a round hole just for the sake of making sure gets are done with GET makes you an extremist. Bodies belong in POST and PUT, not GET.

It's a pretty stupid problem having given how clever their overall architecture is. There's a an idiom for this in English: Homer Nodded.

To get around this flaw, instead of actually allowing us to search with POST (like normal people would), we are forced to use a hack: source query string parameter.

PowerShell Setup

When following along, you'll want to use the PowerShell ISE. Just type ISE in PowerShell.

PowerShell note: hit F5 in ISE to run a script

If you are going to run these in in a ps1 file, make sure to run Set-ExecutionPolicy RemoteSigned. as admin. Microsoft doesn't seem to like PowerShell at all. It's not the default in Windows 8/10. It's not the default on Windows Server. You can't run scripts by default. Someone needs to be fired. Run the aforementioned command to allow local scripts.

HTTP/S call

We're now ready to be awesome.

To start, let's create a call to ES. In the following code, I'm calling HTTPS with authorization. I'm not giving you the sissy Hello World, this is from production. While you're playing around, you can remove HTTP and authorization. You figure out how. That's part of learning.

$base = 'search.domain.net' 

$call = {
    param($params)

    $uri = "https://$base"

    $headers = @{ 
        'Authorization' = 'Basic fVmBDcxgYWpndYXJj3RpY3NlkZzY3awcmxhcN2Rj'
    }

    $response = $null
    $response = wget -Uri "$uri/$params" -method Get -Headers $headers -ContentType 'application/json'
    $response.Content
}

PowerShell note: prefix your : with ` or else you'll get a headache

So far, simple.

We can call &$call to call an HTTPS service with authorization.

But, let's break this out a bit...

$call = {
    param($verb, $params)

    $uri = "https://$base"

    $headers = @{ 
        'Authorization' = 'Basic fVmBDcxgYWpndYXJj3RpY3NlkZzY3awcmxhcN2Rj'
    }

    $response = wget -Uri "$uri/$params" -method $verb -Headers $headers -ContentType 'application/json'
    $response.Content
}

$get = {
    param($params)
    &$call "Get" $params
}

$delete = {
    param($params)
    &$call "Delete" $params
}

Better. Now we can call various verb functions.

To add PUT and POST, we need to account for the POST body. I'm also going to add some debug output to make life easier.

$call = {
    param($verb, $params, $body)

    $uri = "https://$base"

    $headers = @{ 
        'Authorization' = 'Basic fVmBDcxgYWpndYXJj3RpY3NlkZzY3awcmxhcN2Rj'
    }

    Write-Host "`nCalling [$uri/$params]" -f Green
    if($body) {
        if($body) {
            Write-Host "BODY`n--------------------------------------------`n$body`n--------------------------------------------`n" -f Green
        }
    }

    $response = wget -Uri "$uri/$params" -method $verb -Headers $headers -ContentType 'application/json' -Body $body
    $response.Content
}

$put = {
    param($params,  $body)
    &$call "Put" $params $body
}

$post = {
    param($params,  $body)
    &$call "Post" $params $body
}

In addition to having POST and PUT, we can also see what serialized data we are sending, and where.

ES Catalog Output

Now, let's use $call (or $get, etc) in something with some meaning:

$cat = {
    param($json)

    &$get "_cat/indices?v&pretty"
}

This will get the catalog of indexes.

Elasticsearch note: You can throw pretty anywhere to get formatted JSON.

Running &$cat gives me the following json:

[ {
  "health" : "yellow",
  "status" : "open",
  "index" : "site1!production",
  "pri" : "5",
  "rep" : "1",
  "docs.count" : "170",
  "docs.deleted" : "0",
  "store.size" : "2.4mb",
  "pri.store.size" : "2.4mb"
}, {
  "health" : "yellow",
  "status" : "open",
  "index" : "site2!production",
  "pri" : "5",
  "rep" : "1",
  "docs.count" : "141",
  "docs.deleted" : "0",
  "store.size" : "524.9kb",
  "pri.store.size" : "524.9kb"
} ]

But, we're in PowerShell; we can do better:

ConvertFrom-Json (&$cat) | ft

Output:

 health status index                 pri rep docs.count docs.deleted store.size pri.store.size
------ ------ -----                 --- --- ---------- ------------ ---------- --------------
yellow open   site1!staging         5   1   176        0            2.5mb      2.5mb         
yellow open   site2!staging         5   1   144        0            514.5kb    514.5kb    
yellow open   site1!production      5   1   170        0            2.4mb      2.4mb         
yellow open   site2!production      5   1   141        0            524.9kb    524.9kb     

Example note: !production and !staging have nothing to do with ES. It's something I do in ES, Redis, Mongo, SQL Server, and every other place the data will be stored to separate deployments. Normally I would remove this detail from article samples, but the following examples use this to demonstrate filtering.

PowerShell note: Use F8 to run a selection or single line. It might be worth removing your entire $virtualenv, if you want to play around with this.

Much nicer. Not only that, but we have the actual object we can use to filter on the client side. It's not just text.

(ConvertFrom-Json (&$cat)) `
    | where { $_.index -match '!production' }  `
    | select index, docs.count, store.size |  ft

Output:

index              docs.count store.size
-----              ---------- ----------
site1!production   170        2.4mb     
site2!production   141        532.9kb   

Getting Objects from ES

Let's move forward by adding our search function:

$search = {
    param($index, $json)

    &$get "$index/mydatatype/_search?pretty&source=$json"
}

Calling it...

&$search 'site2!production' '{
    "query": {
        "match_phrase": {
            "content": "struggling serves"
        }
    }
}'

Elasticsearch note match_phrase will match the entire literal phrase "struggling serves"; match would have search for "struggling" or "serves". Results will return with a score, sorted by that score; entries with both words would have a higher score than an entry with only one of them. Also, wildcard will allow stuff like `struggl*.

Meh, I'm not a big fan of this analogy of SELECT * FROM [site2!production]:

&$search 'site2!production' '{
    "query": {
        "match_phrase": {
            "content": "struggling serves"
        }
    },
    "fields": ["selector", "title"]
}'

This will return a bunch of JSON.

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.6106973,
    "hits" : [ {
      "_index" : "site2!staging",
      "_type" : "entry",
      "_id" : "AVGUw_v5EFj4l3MNvkeV",
      "_score" : 0.6106973,
      "fields" : {
        "selector" : [ "sweet/mine" ]
      }
    }, {
      "_index" : "site2!staging",
      "_type" : "entry",
      "_id" : "AVGU3PwoEFj4l3MNvk9D",
      "_score" : 0.4333064,
      "fields" : {
        "selector" : [ "of/ambition" ]
      }
    }, {
      "_index" : "site2!staging",
      "_type" : "entry",
      "_id" : "AVGU3QHeEFj4l3MNvk9G",
      "_score" : 0.4333064,
      "fields" : {
        "selector" : [ "or/if" ]
      }
    } ]
  }
}

We can improve on this.

First, we can convert the input to something nicer:

&$search 'site2!production' (ConvertTo-Json @{
    query = @{
        match_phrase = @{
            content = "struggling serves"
        }
    }
    fields = @('selector', 'title')
})

Here we're just creating a dynamic object and serializing it. No JIL or Newtonsoft converters required.

To make this a a lot cleaner, here's a modified $search:

$search = {
    param($index, $json, $obj)
    if($obj) {
        $json = ConvertTo-Json -Depth 10 $obj
    }

   &$get "$index/mydatatype/_search?pretty&source=$json"
}

You need -Depth <Int32> because the default is 2. Nothing deeper than the default will serialize. It will simply show "System.Collections.Hashtable. In ES, you'll definitely have deep objects.

Now, I can call this with this:

&$search 'site2!production' -obj @{
    query = @{
        match_phrase = @{
            content = "struggling serves"
        }
    }
    fields = @('selector', 'title')
}

This works fine. Not only that, but the following code still work:

&$search 'site2!production' '{
    "query": {
        "match_phrase": {
            "content": "struggling serves"
        }
    }
    "fields" = ["selector", "title"]
}'

Now you don't have to fight with escaping strings; you can also still copy/paste JSON with no problem.

JSON to PowerShell Conversion Notes:

  • : becomes =
  • all ending commas go away
    • newlines denote new properties
  • @ before all new objects (e.g. {})
  • [] becomes @()
    • @() is PowerShell for array
  • " becomes ""
    • PowerShell escaping is double-doublequotes

DO NOT FORGET THE @ BEFORE {. If you do, it will sits there forever as it tries to serialize nothing into nothing. After a few minutes, you'll get hundreds of thousands of JSON entries. Seriously. I tries to serialize every aspect of every .NET property forever. This is why the -Depth defaults to 2.

Next, let's format the output:

(ConvertFrom-Json(&$search 'content!staging' 'entry' -obj @{
    query = @{
        match_phrase = @{
            content = "struggling serves"
        }
    }
    fields = @('selector', 'title')
})).hits.hits.fields | ft

Could probably just wrap this up:

$matchPhrase = {
    param($index, $type, $text, $fieldArray)
    (ConvertFrom-Json(&$search $index $type -obj @{
        query = @{
            match_phrase = @{
                content = $text
            }
        }
        fields = $fieldArray
    })).hits.hits.fields | ft
}

Just for completeness, here's $match. Nothing too shocking.

$match = {
    param($index, $type, $text, $fieldArray)
    (ConvertFrom-Json(&$search $index $type -obj @{
        query = @{
            match = @{
                content = $text
            }
        }
        fields = $fieldArray
    })).hits.hits.fields | ft
}

Finally, we have this:

&$match 'content!staging' 'entry' 'even' @('selector', 'title')

Output:

title                               selector           
-----                               --------           
{There Defeats Cursed Sinews}       {forestalled/knees}
{Foul Up Of World}                  {sweet/mine}       
{And As Rank Down}                  {crown/faults}     
{Inclination Confront Angels Stand} {or/if}            
{Turn For Effects World}            {of/ambition}      
{Repent Justice Is Defeats}         {white/bound}      
{Buys Itself Black I}               {form/prayer}

There we go: phenominal cosmic power in an ity bity living space

Beefier Examples and Objects

Here's an example of a search that's a bit beefier:

&$search  'bible!production' -obj @{
    query = @{
        filtered = @{
            query = @{
                match = @{
                    content = "river Egypt"
                }
            }
            filter = @{
                term = @{
                    "labels.key" = "isaiah"
                }
            }
        }
    }
    fields = @("passage")
    highlight = @{
        pre_tags = @("<span class=""search-result"">")
        post_tags = @("</span>")
        fields = @{
            content = @{
                fragment_size = 150
                number_of_fragments = 3
            }
        }
    }
}

Elasticsearch note: Filters are binary: have it or not? Queries are analog: they have a score. In this example, I'm moving a filter with a query. Here I'm searching the index for content containing "river" or "Egypt" where labels: { "key": 'isaiah' }

Using this I'm able to to filter my documents by label where my labels are complex objects like this:

  "labels": [
    {
      "key": "isaiah",
      "implicit": false,
      "metadata": []
    }
  ]

I'm able to search by labels.key to do a hierarchical filter. This isn't an ES tutorial; rather, this is to explain why "labels.key" was in quotes in my PowerShell, but nothing else is.

Design note: The objects you sent to ES should be something optimized for ES. This nested type example is somewhat contrived to demonstrate nesting. You can definitely just throw your data and ES and it will figure out the schema on the fly, but that just means you're lazy. You're probably the type of person who used the forbidden [KnownType] attribute in WCF because you were too lazy to write DTOs. Horrible. Go away.

This beefier example also shows me using ES highlighting. In short, it allows me to tell ES that I want a content summary of a certain size with some keywords wrapped in some specified HTML tags.

This content will show in addition to the requested fields.

The main reason I mention highlighting here is this:

When you serialize the object, it will look weird:

"pre_tags":  [
   "\u003cspan class=\"search-result\"\u003e"
]

Chill. It's fine. I freaked out at first too. Turns out ES can handle unicode just fine.

Let's run with this highlighting idea a bit by simplifying it, parameterizing it, and deserializing the result (like we've done already):

$result = ConvertFrom-Json(&$search  $index -obj @{
    query = @{
        filtered = @{
            query = @{
                match = @{
                    content = $word
                }
            }
        }
    }
    fields = @("selector", "title")
    highlight = @{
        pre_tags = @("<span class=""search-result"">")
        post_tags = @("</span>")
        fields = @{
            content = @{
                fragment_size = 150
                number_of_fragments = 3
            }
        }
    }
})

Nothing new so far. Same thing, just assigning it to a variable...

The JSON is passed back was this...

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 30,
    "max_score" : 0.093797974,
    "hits" : [ {
      "_index" : "content!staging",
      "_type" : "entry",
      "_id" : "AVGhy1octuYZuX6XP7zu",
      "_score" : 0.093797974,
      "fields" : {
        "title" : [ "Pause First Bosom The Oft" ],
        "selector" : [ "hath come/hand of" ]
      },
      "highlight" : {
        "content" : [ " as to fault <span class=\"search-result\">smells</span> ourselves am free wash not tho
se lies enough business eldest sharp first what corrupted blood which state knees wash our cursed oft", " give
 above shall curse be help faults offences this snow me pray shuffling confront ere forgive newborn a engaged 
<span class=\"search-result\">smells</span> rain state angels up form", " ambition eldest i guilt <span class=
\"search-result\">smells</span> forehead true snow thicker rain compelld intent bound my which currents our so
ul of limed angels white snow this buys" ]
      }
    },
    ...    

The data we want is is hits.hits.fields and hits.hits.highlights.

So, we can get them, and play with the output...

$hits = $result.hits.hits
$formatted = $hits | select `
        @{ Name='selector'; Expression={$_.fields.selector} },
        @{ Name='title'; Expression={$_.fields.title} }, 
        @{ Name='highlight'; Expression={$_.highlight.content} }
$formatted

This is basically the following...

hits.Select(p=>new { selector = p.fields.selector, ...});

Output:

selector                   title                            highlight                                        
--------                   -----                            ---------                                        
hath come/hand of          Pause First Bosom The Oft        { as to fault <span class="search-result">smel...
all fall/well thicker      Enough Nature Heaven Help Smells { buys me sweet queen <span class="search-resu...
with tis/law oft           Me And Even Those Business       { if what form this engaged wretched heavens m...
hand forehead/engaged when Form Confront Prize Oft Defeats  {white begin two-fold faults than that strong ...
newborn will/not or        Were Gilded Help Did Nature      { above we evidence still to me no where law o...
cursed thicker/as free     Cursed Tis Corrupted Guilt Where { justice wicked neglect by <span class="searc...
currents hand/true of      Both Than Serves Were May        { serves engaged down ambition to man it is we...
two-fold can/eldest queen  Then Sweet Intent Help Turn      { heart double than stubborn enough the begin ...
force did/neglect whereto  Compelld There Strings Like Not  { it oft sharp those action enough art rests s...
babe whereto/whereto is    As Currents Prayer That Free     { defeats form stand above up <span class="sea...

This part is important: highlight is an array. Your search terms may show up more than once in a document. That's where the whole number_of_fragments = 3 came in. The highlight size is from that fragment_size = 150. So, for each entry we have, we have a selector (basically an ID), a title, and up to three highlights up to 150-characters each.

Let's abstract this stuff before we go to the final data analysis step:

$loadHighlights = {
    param($index, $word)

    $result = ConvertFrom-Json(&$search  $index -obj @{
        query = @{
            filtered = @{
                query = @{
                    match = @{
                        content = $word
                    }
                }
            }
        }
        fields = @("selector", "title")
        highlight = @{
            pre_tags = @("<span class=""search-result"">")
            post_tags = @("</span>")
            fields = @{
                content = @{
                    fragment_size = 150
                    number_of_fragments = 3
                }
            }
        }
    })

    $hits = $result.hits.hits
    $hits | select `
            @{ Name='selector'; Expression={$_.fields.selector} },
            @{ Name='title'; Expression={$_.fields.title} }, 
            @{ Name='highlight'; Expression={$_.highlight.content} }
}

Now, we can run:

&$loadHighlights 'content!staging' 'smells'

Let's use this and analyze the results:

(&$loadHighlights 'content!staging' 'smells').foreach({
    Write-Host ("Selector: {0}" -f $_.selector)
    Write-Host ("Title: {0}`n" -f $_.title)
    $_.highlight | foreach {$i=0} {
        $i++
        Write-Host "$i $_"
    }
    Write-Host "`n`n"
})

Here's a small sampling of the results:

Selector: two-fold can/eldest queen
Title: Then Sweet Intent Help Turn

1  heart double than stubborn enough the begin and confront as <span class="search-result">smells</span> ourse
lves death stronger crown murder steel pray i though stubborn struggling come by
2  of forehead newborn mine above forgive limed offences bosom yet death come angels angels <span class="searc
h-result">smells</span> i sinews past that murder his bosom being look death
3  <span class="search-result">smells</span> where strong ill action mine foul heavens turn so compelld our to
 struggling pause force stubborn look forgive death then death try corrupted



Selector: force did/neglect whereto
Title: Compelld There Strings Like Not

1  it oft sharp those action enough art rests shove stand cannot rain bosom bosom give tis repentance try upon
t possessd my state itself lies <span class="search-result">smells</span> the
2  brothers blood white shove no stubborn than in ere <span class="search-result">smells</span> newborn art re
pentance as like though newborn will form upont pause oft struggling forehead help
3  shuffling serve lies <span class="search-result">smells</span> stand well queen well visage and free his pr
ayer lies that art ere a there law even by business confront offences may retain

We have the selector, the title, and the highlights with <span class="search-result">...</span> showing us where our terms were found.

Setup

Up to this point, I've assumed that you already have an ES setup. Setup is once, but playing around is continuous. So, I got the playing around out of the way first.

Now, I'll go back and talk about ES setup with PowerShell. This should GREATLY improve your ES development. Well, it's what helps me at least...

ES has a schema. It's not fully Platonic like SQL Server, nor is it fully Aristotelian like MongoDB. You can throw all kinds of things at ES and it will figure them out. This is what ES calls dynamic mapping. If you like the idea of digging through incredibly massive data dumps during debugging or passing impossibly huge datasets back and forth, then this might be the way to go (were you "that guy" who threw [KnownType] on your WCF objects? This is you. You have no self-respect.) On the other hand, if you are into light-weight objects, you're probably passing ES a nice tight JSON object anyway. In any case, want schema to be computed as you go? That's dynamic mapping. Want to define your schema and have ES ignore unknown properties? That's, well, disabling dynamic mapping.

Dynamic mapping ends up being similar to the lowest-common denominator ("LCD") schema like in Azure Table Storage: your schema might end up looking like a combination of all fields in all documents.

ES doesn't so much deal with "schema" in the abstract, but with concrete indexes and types.

No, that doesn't mean it's schemaless. That's insane. It means that index and the types are the schema.

In any case, in ES, you create indexes. These are like your tables. Your indexes will have metadata and various properties much like SQL Server metadata and columns. Properties have types just like SQL Server columns. Unlike SQL Server, there's also a concept of a type. Indexes can have multiple types.

Per the Elasticsearch: Definitive Guide, the type is little more than a "_type" property internally, thus types are almost like property keys in Azure Table Storage. This means that when searching, you're searching across all types, unless you specify the type as well. Again, this maps pretty closely to a property key in Azure Table Storage.

Creating an index

Creating an index with a type is a matter of calling POST / with your mapping JSON object. Our $createIndex function will be really simple:

$createIndex = {
    param($index, $json, $obj)
    if($obj) {
        $json = ConvertTo-Json -Depth 10 $obj
    }
    &$post $index $json
}

Thing don't get interesting until we call it:

&$createIndex 'content!staging' -obj @{
    mappings = @{
        entry = @{
            properties = @{
                selector = @{
                    type = "string"
                }
                title = @{
                    type = "string"
                }
                content = @{
                    type = "string"
                }
                created = @{
                    type = "date"
                    format = "YYYY-MM-DD"  
                }
                modified = @{
                    type = "date"                  
                }
            }
        }
    }
}

This creates an index called content!staging with a type called entry with five properties: selector, title, content, created, and modified.

The created property is there to demonstrate that fact that you can throw formats on properties. Normally, dates are UTC, but here I'm specifying that I don't even care about times when it comes to the create date.

With this created, we can see how ES sees our data. We do this by calling GET //_mapping:

$mapping = {
    param($index)
   &$get "$index/_mapping?pretty"
}

Now to call it...

&$mapping 'content!staging'

Adding Data

Now to throw some data at this index...

Once again, the PowerShell function is simple:

$add = {
    param($index, $type, $json, $obj)
    if($obj) {
        $json = ConvertTo-Json -Depth 10 $obj
    }
    &$post "$index/$type" $json
}

To add data, I'm going to use a trick I wrote about elsewhere

if (!([System.Management.Automation.PSTypeName]'_netfxharmonics.hamlet.Generator').Type) {
    Add-Type -Language CSharp -TypeDefinition '
        namespace _Netfxharmonics.Hamlet {
            public static class Generator {
                private static readonly string[] Words = "o my offence is rank it smells to heaven hath the primal eldest curse upont a brothers murder pray can i not though inclination be as sharp will stronger guilt defeats strong intent and like man double business bound stand in pause where shall first begin both neglect what if this cursed hand were thicker than itself with blood there rain enough sweet heavens wash white snow whereto serves mercy but confront visage of whats prayer two-fold force forestalled ere we come fall or pardond being down then ill look up fault past form serve turn forgive me foul that cannot since am still possessd those effects for which did crown mine own ambition queen may one retain corrupted currents world offences gilded shove by justice oft tis seen wicked prize buys out law so above no shuffling action lies his true nature ourselves compelld even teeth forehead our faults give evidence rests try repentance can yet when repent wretched state bosom black death limed soul struggling free art more engaged help angels make assay bow stubborn knees heart strings steel soft sinews newborn babe all well".Split('' '');
                private static readonly int Length = Words.Length;
                private static readonly System.Random Rand = new System.Random();

                public static string Run(int count, bool subsequent = false) {
                    return Words[Rand.Next(1, Length)] + (count == 1 ? "" : " " + Run(count - 1, true));
                }
            }
        }

    '
}

n Let's use $gen with $add to load up some data:

$ti = (Get-Culture).TextInfo
(1..30).foreach({
    &$add -index 'content!staging' -type 'entry' -obj @{
        selector = "{0}/{1}" -f ([_netfxharmonics.hamlet.Generator]::Run(1)), ([_netfxharmonics.hamlet.Generator]::Run(1))
        title = $ti.ToTitleCase([_netfxharmonics.hamlet.Generator]::Run(4))
        content = [_netfxharmonics.hamlet.Generator]::Run(400) + '.'
        created = [DateTime]::Now.ToString("yyyy-MM-dd")
        modified = [DateTime]::Now.ToUniversalTime().ToString("o")
    }
})

This runs fast; crank the sucker up to 3000 with a larger content size if you want. Remove "Write-Host" from $call for more speed.

Your output will look something like this

Calling [https://search.domain.net/content!staging/entry]
BODY
--------------------------------------------
{
    "selector":  "death/defeats",
    "title":  "Sinews Look Lies Rank",
    "created":  "2015-11-07",
    "content":  "Engaged shove evidence soul even stronger bosom bound form soul wicked oft compelld steel which turn prize yet stand prize.",
    "modified":  "2015-11-07T05:45:10.6943622Z"
}
--------------------------------------------

When a run one of the earlier searches...

&$match 'content!staging' 'entry' 'even'

...we get the following results:

selector        
--------        
{death/defeats}

Debugging

If stuff doesn't work, you need to figure out how to find out why; not simply find out why. So, brain-mode activated: wrong results? Are you getting them wrong or are they actually wrong? Can't insert? Did the data make it to the server at all? Did it make it there, but couldn't get inserted? Did it make it there, get inserted, but you were simply told that it didn't insert? Figure it out.

As far as simple helps, I'd recommend doing some type of dump:

$dump = {
    param($index, $type)
    &$get "$index/$type/_search?q=*:*&pretty"
}

This is a raw JSON dump. You might want to copy/paste somewhere for analysis, or play around in PowerShell:

(ConvertFrom-Json(&$dump 'content!staging' 'entry')).hits.hits

I'd recommend just using a text editor to look around instead of random faux PowerShell data-mining. Because, JSON and XML both absolutely perfectly human readable, you'll see what you need quick. Even then, there's no reason no to just type the actual link into your own browser:

http://10.1.60.3:9200/content!staging/entry/_search?q=*:*&pretty

I'd recommend the Pretty Beautiful Javascript extension for Chrome.

You can remove &pretty when using this Chrome extension.

Another thing I'd strongly recommend is having a JSON beautifier toggle for input JSON:

$pretty = $False

So you can do something like this:

$serialize = {
    param($obj)
    if(!$pretty) {
        $pretty = $false
    }
    if($pretty) {
        ConvertTo-Json -Depth 10 $obj;
    }
    else {
        ConvertTo-Json -Compress -Depth 10 $obj
    }
}

Instead of calling ConvertTo-Json in your other places, just call &$serialize.

$search = {
    param($index, $type, $json, $obj)
    if($obj) {
        $json = &$serialize $obj
    }
    &$get "$index/$type/_search?pretty&source=$json"
}

Remember, this is for input, not output. This is for the data going to the server.

You want this option because once you disable this, you can do this:

&$match 'content!staging' 'entry' 'struggling' @('selector', 'title')

To get this...

Calling [http://10.1.60.3:9200/content!staging/entry/_search?pretty&source={"fields":["selector","title"],"query":{"match":{"content":"struggling"}}}]

title                            selector         
-----                            --------         
{Those Knees Look State}         {own/heavens}    

Now you have a URL you can dump into your web browser. Also, you have a link to share with others.

Regarding logs, on Linux systems you can view error messages at the following location:

/var/log/elasticsearch/CLUSTERNAME.log

I like to keep a SSH connection open while watching the log closely:

tail -f /var/log/elasticsearch/david-content.log 

Your cluster name will be configured in the YAML config somewhere around /etc/elasticsearch/elasticsearch.yml.

Updating (and Redis integration)

Updating Elasticsearch objects ("documents") is interesting for two reasons, a good one and a weird one:

Good reason: documents are immutable. Updates involve marking the existing item as deleted and inserting a new document. This is exactly how SQL Server 2014 IMOLTP works. It's one secret of extreme efficiency. It's an excellent practice to follow.

Weird reason: you have to update to know the integer ID to update a document. It's highly efficient, which makes it, at worst, "weird"; not "bad". It it allowed updates based on custom fields, you'd have a potential perf hit. Key lookups are the fastest.

Prior to Elasticsearch 2.x, you could add something like { "_id": { "path": "selector" } } to tell ES that you want to use your "selector" field as your ID. This was deprecared in version 1.5 and removed in 2.x (yes, they are two separate things). Today, _id is immutable. So, when you see docs saying you can do this, check the version. It will probably be something like version 1.4. Compare the docs for _id in version 1.4 with version 2.1 to see what I mean.

When you make a call like the following example, an cryptic ID is generated:

POST http://10.1.60.3:9200/content!staging/entry

But, you can specify the integer:

POST http://10.1.60.3:9200/content!staging/entry/5

This is great, but nobody anywhere cares about integer IDs. These surrogate keys have absolutely no meaning to your document. How in the world could you possibly know how to update something? You have to know the ID. Which means, you saved it somewhere after you updated it. You can't simply update based on your own known key (in our examples here "selector"; though up to this point I never made them unique).

So, OK, fine... have to save some type of surrogate key to key map somewhere. Where could I possibly save them? Elasticsearh IS. MY. DATABASE. I need something insanely efficient for key / value lookups, but that persists to disk. I need something easy to use on all platforms. It should also be non-experimental. It should be a time-tested system. Oh... right: Redis.

The marketing says that Redis is a "cache". Whatever that means. It's the job of marketing to either lie about products to trick people into buying stuff or to downplay stuff for the sake of a niche market. In reality, Redis is a key/value database. It's highly efficiently and works everywhere. It's perfect. Let's start making the awesome...

I'l all about doing things based on first-principles (people who can't do this laugh at people who can do this and accuse them of "not invented here syndrome"; jealous expresses it in many ways), but I'm here I'm going to use the Stackoverflow.Redis package. It seems to be pretty standard and it works pretty well. I'm running it in a few places. Create some VS2015 (or whatever) project and add the NuGet package. Or, go find it and download it. But... meh... that sounds like work. Use NuGet.. Now we're going to reference that DLL..

$setupTracking = {
    Add-Type -Path 'E:\_GIT\awesomeness\packages\StackExchange.Redis.1.0.488\lib\net45\StackExchange.Redis.dll'
    $cs = '10.1.60.2'
    $config = [StackExchange.Redis.ConfigurationOptions]::Parse($cs)
    $connection = [StackExchange.Redis.ConnectionMultiplexer]::Connect($config)
    $connection.GetDatabase()
}

Here I'm adding the assembly, creating my connection string, creating a connection, then getting the database.

Let's call this and set some type of relative global:

$redis = &$setupTracking

We need to go over a few things in Redis first:

Redis communicates over TCP. You sends commands to it and you get stuff back. The commands are assembler-looking codes like:

  • HGET
  • FLUSHALL
  • KEYS
  • GET
  • SET
  • INCR

When you use INCR, you are incrementing a counter. So...

INCR taco

That sets taco to 1.

INCR taco

Now it's 2.

We can get the value...

GET taco

The return value will be 2.

By the way, this is how you setup realtime counters on your website. You don't have to choose between database locking and eventual consistency. Use Redis.

Then there's the idea of a hash. You know, a dictionary-looking thingy.

So,

HSET chicken elephant "dude"

This sets elephant on the chicken hash to "dude".

HGET chicken elephant

This gets "dude". Shocking... I know.

HGETALL chicken

This dumps the entire chicken hash.

Weird names demonstrate that the name has nothing to do with the system and it forces you to think, thus remembering it better long-term.

To get all the values, do something like this:

KEYS *

When I say "all", I mean "all". Both the values that INCR and the values from HSET will show. This is a typical wildcard. You can do stuff like KEYS *name* just fine.

Naming note: Do whatever you want, but it's commmon to use names like "MASTERSCOPE:SCOPE#VARIABLE". My system already has a well defined internal naming system of Area!Environment, so in what follows we'll use "content!staging#counter" and "content!staging#Hlookup"

OK, that's enough to get started. Here's the plan: Because the integer IDs mean absolutely nothing to me, I'm going to treat them as an implemenation detail; more technically, as a surrogate key. My key is selector. I want to update via selector not some internal ID that means nothing to me.

To do this, I'll basically just emulate what Elasticsearch 1.4 did: specify what property I want as my key.

To this end, I need to add a new $lookupId function, plus update both $add and $delete:

$lookupId = {
    param($index, $selector)

    if($redis) {
        $id = [int]$redis.HashGet("$index#Hlookup", $selector)
    }
    if(!$id) {
        $id = 0
    }
    $id
}

$add = {
    param($index, $type, $json, $obj, $key)
    if($obj) {
        $json = &$serialize $obj
        if($key) {
            $keyValue = $obj[$key]
        }
    }
    
    if($redis -and $keyValue) {
        $id = &$lookupId $index $keyValue
        Write-Host "`$id is $id"
        if($id -eq 0) {
            $id = [int]$redis.StringIncrement("$index#counter")
            if($verbose) {
                Write-Host "Linking $keyValue to $id"
            }
            &$post "$index/$type/$id" $json
            [void]$redis.HashSet("$index#Hlookup", $keyValue, $id)
        }
        else {
            &$put "$index/$type/$id" $json
        }

    }
    else {
        &$post "$index/$type" $json
    }
}

$delete = {
    param($index)
    &$call "Delete" $index

    if($redis) {
        [void]$redis.KeyDelete("$index#counter")
        [void]$redis.KeyDelete("$index#Hlookup")
    }
}

When stuff doens't exist, you get some type of blank entity. I've never seen a null while using the Stackoverflow.Redis package, so that's something to celebrate. The values that Stackoverflow.Redis methods work with are RedisKey and RedisValue. There's not much to learn there though, since there are operators for many different conversions. You can work with strings just fine without needing to know about RedisKey and RedisValue.

So, if I'm sending it a key, key the key value from the object I sent in. If there is a key value and Redis is enabled and active, see if that key value is the ID of an existing item. That's a Redis lookup. Not there? OK, must be new, use Redis to generate a new ID and send that to Elasticsearch (POST $index/$type/$id). The ID was already there? That means the selector was already assigned a unique, sequential ID by Redis, use that for the update.

For now, POST works fine for an Elasticsearch update as well. Regardless, I'd recommend using PUT for update even though POST works. You never know when they'll enforce it.

Let's run a quick test:

$selectorArray = &$generate 'content!staging' 'entry' 2 -key 'selector'

($selectorArray).foreach({
    $selector = $_
    Write-Host ("ID for $selector is {0}" -f (&$lookupId 'content!staging' $selector))
})

Output:

ID for visage/is is 4
ID for if/blood is 5

I'm going to hope over to Chrome to see how my data looks:

http://10.1.60.3:9200/content!staging/_search?q=*:*

It's there...

{
    "_index": "content!staging",
    "_type": "entry",
    "_id": "4",
    "_score": 1,
    "_source": {
        "selector": "visage/is",
        "title": "Bound Fault Pray Or",
        "created": "2015-11-07",
        "content": "very long content omitted",
        "modified": "2015-11-07T22:24:23.0283870Z"
    }
}

Cool, ID is 4.

What about updating?

Let's try it...

$obj = @{
    selector = "visage/is"
    title = 'new title, same document'
    content = 'smaller content'
    created = [DateTime]::Now.ToString("yyyy-MM-dd")
    modified = [DateTime]::Now.ToUniversalTime().ToString("o")
}
&$add -index 'content!staging' -type 'entry' -obj $obj -key 'selector' > $null

Output:

{
    "_index": "content!staging",
    "_type": "entry",
    "_id": "4",
    "_score": 1,
    "_source": {
        "selector": "visage/is",
        "title": "new title, same document",
        "created": "2015-11-07",
        "content": "smaller content",
        "modified": "2015-11-07T23:11:58.4607963Z"
    }
}

Sweet. Now I can update via my own key (selector) and not have to ever touch Elasticsearch surrogate keys (_id).

Ensuring SSL via HSTS

My wife and I use a certain vacation website to take epic vacations where we pay mostly for the hotel and get event tickets (e.g. Disney World tickets) free. I love the site-- but sometimes I'm paranoid using it. Why? Here's why:

http

No pretty lock.

You say: but were you on the checkout screen or account areas?

No.

You say: Then who cares? You're not doing anything private.

Imagine this: You're on epic-travel-site.com and you're ready to create an account. So, you click on create account. You now see the pretty SSL lock. You also see where you to in your personal information and password. You put in the info and hit submit. All is well.

Well, not so much: what you didn't notice was that page with the "Create Account" button was intercepted and modified so the "Create Account" link actually took you to epic-travel-site.loginmanagementservices.com instead of account.epic-travel-site.com.

Even then, you'll never know if that's wrong: perhaps they use loginmanagementservices.com for account management. Many sites I go to use Microsoft, Google, or Facebook for account management. It could be legit or you could have just sent your password to was owned by an information terrorist punk with a closet full of Guy Fawkes masks.

301

SSL isn't just about end-to-end encryption, it's also about end-to-end protection. You need SSL at the start.

This is easy to enforce on every webserver. On IIS it's a simple rewrite (chill, getting to Linux in a bit):

<system.webServer>
  <rewrite>
    <rules>
      <rule name="Redirect to https">
        <match url="(.*)" />
        <conditions>
          <add input="{HTTPS}" pattern="Off" />
        </conditions>
        <action type="Redirect" url="https://{HTTP_HOST}/{R:1}" />
      </rule>
    </rules>
  </rewrite>
</system.webServer>

Note: rewriting and routing are not the same. Rewriting is a very low-level procedure. Routing is application-level.

Using this, when you access HTTP, you'll get sent over to HTTPS. This is a 301 redirect.

http

That's great, but that still gives the Guy Fawkes-fanboys an opportunity to give you a horrible redirect; it's pretty easy to swap out.

307

So, let's upgrade our security: HTTP Strict Transport Security ("HSTS").

This entire thing is mostly TL;DR. For more detailed information about this and other more hardcore security techniques, watch Troy Hunt's Introduction to Browser Security Headers course at Pluralsight.

The essence of this is simple: the browser will send a Strict-Transport-Security header with a max-age in seconds. This will tell your browser to ignore your requests for HTTP and get HTTPS instead.

What browsers can use HSTS today? Go look:

http

See also: http://caniuse.com/#search=hsts

If your browser doesn't support HSTS, you need to keep up with updates. Get with it. But, whatever: if you're ancient browser doesn't support HSTS, nothing will break; you just won't receive any benefit.

What's this Strict-Transport-Security header do? Try to access the HTTP version and see:

http

It's not going to bother with the 301 roundtrip; instead, the browser will just do an "Internal Redirect". 307 means "Internal Redirect".

To put it simply: a 301 is HTTP then HTTPS and a 307 is only ever HTTPS. The existence of HTTP is not even acknowledged.

For clients, this is a major security boost. For servers, this is both a security and performance boost: you are no longer handling all those HTTP redirects! Some politician/salesman would say: "It's like getting free bandwidth!"

Testing HSTS (Chrome)

Lesson: for each web site, start with a 2 day max age and slowly grow.

While removing HSTS for others can be a hassle, removing HSTS during testing and development for yourself is simple:

In Chrome, you go to chrome://net-internals/#hsts.

You'll see delete and query. Throw your domain into query to see it. Throw it into delete to delete it.

Testing HSTS (Firefox)

For the 2-3 people left who think Firefox stills matters,

First, know that you aren't going to see a 307. It's an internal redirect anyway. It's hardly a real header. In Firefox, you see,

http

Second, to test this stuff, you'll need Firebug and to make sure that persist is enabled (with this you can see directs, not just data for currently loaded page; use clear to clear console):

http

Third, while Fiddler's disable-cache is great for most days, Firefox throws a bad fix for invalid certs. So, disable cache in Firebug:

http

Now you'll be able to test HSTS in Firefox. Once you can verify that you can see the header and the redirect, you can have certainty of it's removal.

To remove HSTS locally, look for the SiteSecurityServiceState.txt file in your Firefox profile. Where's THAT? I'm not about to remember where it is. I'm saving that brain space for important thing. On Windows, I just run a quick Agent Ransack search (a tool which should be part of your standard install!). You could quickly find it with PowerShell as well:

ls $env:appdata\Mozilla -r site*

On Linux, it's a freebie:

find / -name SiteSecurityServiceState.txt

Once found, you apparently have to exit Firefox to get it to flush to the file. Then, you can edit your domain out of the file.

TOFU

Now to upgrade the security again...

That great, but you still have to trust the website on first access (often called the TOFU problem: trust on first use; remember: no sane person likes tofu). You accessed HTTP first. That may have been hacked. The well may have been poisoned. It's the same hack as in the account management example; it's just moved a little earlier.

The solution actually quite simple: have the fact that you want HTTPS-only baked into the web browsers. Yeah, it's possible.

You do this by upgrading your security again, then submitting your site to https://hstspreload.appspot.com/.

The upgrade requirements are as follows:

  • Have an SSL cert (duh!)
  • Use the SSL cert for the domain and all subdomains (e.g. account., www., etc...)
  • Upgrade the HSTS header to have a max-age of 126 days
  • Upgrade the HSTS header to enforce HSTS for all subdomains
  • Upgrade the HSTS header to enable preloading

The ultimately boils down to this:

http

That's it.

It says 126 days, but https://www.ssllabs.com/ssltest/ gives you a warning for anything under 180. Just do 180.

Just submit your site to https://hstspreload.appspot.com/ and you'll be baked into the browser (well, browsers depending on how much it's shared). It will tell you about any potential errors.

Nobody will see this until updates come through. This is one reason updates and important.

How can you prevent just anyone from submitting you? You can't. By adding preload, you stated that you wanted this. The hstspreload website will make sure you want this before bothers doing anything with it.

ASP.NET WebApi / ASP.NET MVC

How do you add this to my website? You add it the same way you add any header. If you are using ASP.NET MVC or ASP.NET WebApi, you just create a filter.

ASP.NET WebApi:

public class HstsActionFilterAttribute : System.Web.Http.Filters.ActionFilterAttribute
{
    public override void OnActionExecuted(System.Web.Http.Filters.HttpActionExecutedContext actionExecutedContext)
    {
        actionExecutedContext.Response.Headers.Add("Strict-Transport-Security", "max-age=10886400");
    }
}

ASP.MET MVC:

public class HstsActionFilterAttribute : System.Web.Mvc.ActionFilterAttribute
{
    public override void OnResultExecuted(System.Web.Mvc.ResultExecutedContext filterContext)
    {
        filterContext.HttpContext.Response.AppendHeader("Strict-Transport-Security", "max-age=10886400");
        base.OnResultExecuted(filterContext);
    }
}

What do you do if you enforce HTTPS then find out later that the client decided that being hacked is cool and wants to remove HSTS and SSL?

Well, first you get them to sign a releasing stating that you warned them about security and they threw it back in your face.

After that, set max-age to 0 and hope every single person who ever accessed yourself comes back to get the new header. After that, removing the header and SSL. In reality: that's not going to happen. The people who didn't get the max-age=0 header will be locked out until the max-age expires.

NWebSec

Filters for ASP.NET MVC and ASP.NET WebApi are great, but the best type of coding is coding-by-subtraction. Support less. Relax more. To this end, you'll want to whip out the NWebSec nuget package.

Once you add this package, your web.config file will be modified for the initial setup. You just need to add your own configuration.

Here's a typical config I use for my ASP.NET WebAPI servers:

<nwebsec>
  <httpHeaderSecurityModule xmlns="http://nwebsec.com/HttpHeaderSecurityModuleConfig.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="NWebsecConfig/HttpHeaderSecurityModuleConfig.xsd">
    <redirectValidation enabled="false">
      <add allowedDestination="https://mysamplewebsite.azurewebsites.net/" />
    </redirectValidation>
    <securityHttpHeaders>
      <x-Frame-Options policy="Deny" />
      <strict-Transport-Security max-age="126" httpsOnly="true" includeSubdomains="true" preload="true" />
      <x-Content-Type-Options enabled="true" />
      <content-Security-Policy enabled="true">
        <default-src self="true" />
        <script-src self="true" /> 
        <style-src none="true" />
        <img-src none="true" />
        <object-src none="true" />
        <media-src none="true" />
        <frame-src none="true" />
        <font-src none="true" />
        <connect-src none="true" />
        <frame-ancestors none="true" />
        <report-uri>
          <add report-uri="https://report-uri.io/report/6f42f369dd72ec153d55b775ad48aad7/reportOnly" />
        </report-uri>
      </content-Security-Policy>
    </securityHttpHeaders>
  </httpHeaderSecurityModule>
</nwebsec>

The important line for this discussion is the strict-Transport-Security element.

NWebSec works by days, not seconds. Just keep that in mind.

The content-Security-Policy is also critical for solid security, but that's a discussion for a different day. Just keep in mind that the previous example was for an API server. ASP.NET MVC would require you to open access to images, scripts, and other browser-related entities.

Nginx

Lets say you don't want to add this to your website. Wait, why not? Here's one reason: you don't have access to SSL on the box, but you do somewhere else. Azure comes to mind: you have a free-tier website, but want SSL with a custom domain. You'll have to use nginx for SSL termination with Varnish (caching server) to offload a lot of traffic from Azure.

Think it through: varnish is in front of your website providing caching. What about putting something in front of it to provide SSL? Easy. That's called SSL termination. Before I got completly sick of ASP.NET and rewrote everything in Python/Django, I did this for most of my ASP.NET web sites.

In your Nginx config, you can do something like this:

http

As usual, Linux is more elegant than .NET/Windows. Just a fact of life...

Production

When you implement SSL, don't celebrate and advertise the greatness of your security milestone. You only made it to the 1990s. That's nothing to brag about. You need to hit HSTS before you hit something anything close to modern security. If you are doing any law-enforcement or finance work, you need to implement HPKP before you hit the minimum required security to earn your responsible-adult badge.

Even then, if you ever want to ruin your life, use the standard development SDLC for these features: do it in development, give it to QA, throw it on a staging environment, then push to production. You will ruin your entire company. Why? Because you have to slowly implement these features in production.

Here's one route to take: run HSTS through your SDLC (starting with staging) with a 2 second max-age. Yes, it's pointless, but it will tell you if it's working at all. Give it a week. Jump to 5 seconds. Give it two weeks. Jump to 30 seconds. At this point, possibly give it a few months. Slowly increase, slowly add browser-upgrade-recommendations to your web site, and slowly give clients warnings (HSTS is passive and won't ever kick anyone out; but you want them to be secure, so inform them!)

Eventually, you'll get to a 128 model where you can add the preload option and request that your site be baked into the browsers themselves.

Arch Linux Setup

"Back in my day..."

...Linux was 50MB. The entire server. Today? Eleventy-billion GB.

Well, that's the story with what I call Noob Linux (aka Ubuntu). I wrote off RedHat for years because of the insanity of the size (multiple CDs in the 90s). Today Ubuntu is the new RedHat in that regard 1.

Debian didn't buy into the bloat entirely; it's still nice. Yet, for me, there's a new kid on the block that was able to release my Debian death-grip: Arch Linux.

It's powerful. It's simple to use. It has a proper level of complexity to allow customization. The levels of porcelain and plumbing are quite balanced. Modern "Microsoft stack" developers get this. This is what C# is all about. This is what Azure is all about.

Now, how does one setup Arch linux? Answer: manually. Once you get through this, you've effectively paid your dues.

I'll give some commentary along the way. I generally follow the motto: if you can't repeat it, you can't do it. "I did it once" means nothing (this is why skill > experience). Don't rely on luck or "HOLY CRAP IT WORKS. NOBODY TOUCH IT".

Prerequisites

You're going to be in the console. Linux uses a console. A GUI on Linux is not Linux. It's heresy. So, know these:

  • CTRL-A : beginning of line

  • CTRL-E : end of line

  • ESC then left-arrow : go back one word

  • ESC then right-arrow: go forward one word

  • If backspace gives you weird characters, hold CTRL while hitting backspace.

  • For passwords, use the numbers above your letters, not on the keypad. This might (donno) be the case in a few places.

You are not ready to be awesome.

Core Installation

When you first mount your disc and start your VM (it's a good assumption), you'll get a screen...

Arch Linux boot

...then a prompt. Let's run with that...

Arch Linux boot

First, let's setup our disk. This is the same thing you're going to do in Windows, but Linux is all about more control:

What disk do you want to play with? Let's find out:

# fdisk -l

Arch Linux fdisk -l

Note: this is for MBR. For GUID partitions, use gdisk.

See the /dev/sda and /dev/sdb stuff? Those are your disks: a and b (and c and d, etc...).

Look at the sizes deduce which one to use. You have to do the same on Windows. I've many disks. My boot SSD is much smaller than the others; that's how I know which disk to use.

Now, get into fdisk (it's a cli).

# fdisk /dev/sda

I'm going to demonstrate two different methods for the swap space. So skim both before playing with anything.

We want to create three partitions:

Regarding the 4GB swap partition, don't do the typical Microsoft developer "lol guidelines are for noooooooooooooooobs ima do what feels goodz. nobody has eeeeeeeever done this before so ima be a pioneerz" nonsense. Reference the size recommendations to see if 4G was right for you. RTFM! (that's reference the freggin manual-- nobody should ever read a manual like it's a Tolstoy novel)

Here are the commands in series: n, +500M, n, +4G, t, 2, 82, n, w.

You're going to use a lot of the defaults (e.g. partition number and start sector). Reference the following image sample:

Arch Linux create swap

Honestly, it's better if you play with this until it makes sense. Holding your hand won't help you in the long run. Learning comes from mistakes. Make those mistakes... and create VM snapshots.

"Commentary":

  #(MBR; use gdisk for GUID paritions)
  (250 to 500M for boot, then RAM-sized swap, then /)
  n
    (new partition)
    (enter=defaults)
  t
    (change type)
    (82 swap)
  w
    (write/save)

Alternatively, you could use gdisk (for GUID partitions). Same idea.

Now... let's format these (ext3 is fine too):

# format partition 3    
mkfs.ext4 /dev/sda3

# format partition 1
mkfs.ext4 /dev/sda1

We are skipping partition 2 for now. That's a swap partition. That a different paradigm entirely.

Next, we will mount these.

The big partition first...

mount /dev/sda3 /mnt

We mount this as /mnt not / because are currently in a system that using /. Windows splits the world up into separate drives via letter whereas Unix/Linux (hereafter just "Linux") systems use a unified naming system. In Windows you can change drive letters; in Linux you change mount points.

With this mounted, we will create a place for boot partition to mount:

mkdir /mnt/boot

Then mount the boot partition:

mount /dev/sda1 /mnt/boot

What about the 4GB partition? That's the swap partition. We treat it differently. We create the swap area then use it.

mkswap /dev/sda2
swapon /dev/sda2

Havinag said this, there's another way to handle swap space, a more modern-world / VM / dev environment compatible method: don't create an entire swap partition.

The alternative to a swap partition is a swap file. In this paradigm, don't create a swap partition. Just create your 500M boot partition and your *MB data partition.

This works here because we are using ext4. Do not try this with btrfs.

Then, instead of creating a 4G monster swap partition, let's create a smaller 512M swap file:

fallocate -l 512M /swp

Let's make it so that only the file owner (in this case, the file creator: root) can read and write it:

chmod 600 /swp

Now we can do our other commands:

mkswap /swp
swapon /swp

See https://wiki.archlinux.org/index.php/Swap for more information.

We are now at the point where we install the base system. This is where it's really cool: you choose what to install. Personally, I've a few things I want to install right away so I don't have to deal with them later:

Here's my full command 2 3:

pacstrap -i /mnt base base-devel grub-bios openssh sudo

This command also tell you something important: whether you are on the Internet or not.

This is:

  • Base system
  • Basic dev tools
  • Kernely stuff
  • Boot manager
  • OpenSSH server
  • sudo (to run commands as root)

In case you're curious, here's what's inside base:

Arch Linux pacstrap base

We're almost done with the system setup.

We need to tell the system about our mount points. There's a tool for that.

genfstab -U -p /mnt >> /mnt/etc/fstab

The -U is important. Instead of mapping to /dev/sda1, you're telling the system to use a GUID. This removes the risk of things moving around. See SOF for more information.

If you did did the swap file method instead of the swap partition method: you also need to run the following:

echo "/swp none swap defaults 0 0" >> /mnt/etc/fstab

This will save make sure that /swp is used on boot.

General Linux advice: during any setup always ask yourself if it needs to be something that starts on boot. Different different Linux distros will have different setups. As you'll see in a moment, Arch Linux follows the "systemctl enable X" method.

Now we need to do some stuff that presupposes your system is mounted at /. So, we need to change our root:

arch-chroot /mnt

Now /mnt is /.

Change your password:

passwd

Let's create a boot optimizer:

mkinitcpio -p linux

Magic voodoo? No. That doesn't exist. That's fiction. Well, ok, a technology advanced enough... is magic... or something.

So, you've special drivers needed to use your disk. Where are the drivers? On... the... disk. Lovely. Well... computer architecture to the rescue:

Note: this is only for MBR. More modern systems use entirely different mechanics.

When you turn on a computer, at some point the bios is going to know what disk you want to boot from (you know, that setting you set in the F2/F10/del screen). When it gives to your hard drive, it will look in the first 512 bytes for some type of instruction of what to do. This 512 byte area is the Master Boot Record.

It's a fun activity to play with this 512 byte area. You can write a quick assembler tool to have your own mini-OS. Here's a fun tutorial... Its uses 16-bit registers (e.g. AX, BX), so it's a fun flashback for many of us.

Anyway... Linux has this little image that mkinitcpio creates that contains the drivers 'n stuff needed to boot your system. See reddit for more info.

OK, let's use the boot loader to tell it how to boot:

grub-install /dev/sda

Now to save the related config:

grub-mkconfig -o /boot/grub/grub.cfg

We're done. Let's exit chroot, unmount, and reboot. Remember to unmount your ISO / remove disc.

exit
umount -R /mnt
reboot

Phase 2 Installation

Done? No. Sure you can boot, but you're about to login as root. Dude... no.

There's more to do, but I hate the idea of doing it on a console. In Hyper-V, you're not at a place where you can copy/paste. So, let's turn on networking and SSH into the server.

First, get the DHCP client doing something:

dhcpcd

You'll probably see some scary looking "no carrier" nonsense. Ignore it.

Wait a few seconds. Then try ip addr. No IP yet? Wait longer and run ip addr again.

Now what? Well, we SSH into our server. But... we never made a user. Yeah, so, I cheat...

Edit your /etc/ssh/sshd_config. Add PermitRootLogin yes.

You can do this one in command with the following:

echo "
PermitRootLogin yes
" >> /etc/ssh/sshd_config

This will allow root to SSH into your server.

We'll remove this a little later. Chill out.

Anyway, now to start OpenSSH:

systemctl restart sshd

Just so we don't forget, let's set OpenSSH to auto start on boot:

systemctl enable sshd

You can now SSH into your server as root. Do it.

How? Answer: get an SSH client. The popular one is putty. I can't stand putty. I do keep it in my Windows folder for quick access from Win-R, but only as a backup. I personally prefer SecureCRT. I definitely find it worth the money.

The rest of this is mostly copy/paste.

Let's create a user with the appropriate groups:

useradd -m -g users -G wheel,storage,power -s /bin/bash user01

Change password:

passwd user01

That command asks you for a password, so you're not going to be able to simply paste it with a bunch of other subsequent commands into SSH.

However, the rest of this can be copy/pasted in:

# sudo users
echo "%wheel ALL=(ALL) ALL" >> /etc/sudoers

# hostname
echo "arch" > /etc/hostname

# localization
echo LANG="en_US.UTF-8" >> /etc/locale.conf
echo LC_COLLATE="C" >> /etc/locale.conf
echo LC_TIME="en_US.UTF-8" >> /etc/locale.conf
echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen
export LANG=en_US.UTF-8
locale-gen

# clock
ln -sf /usr/share/zoneinfo/America/Chicago /etc/localtime
hwclock --systohc --utc

# name servers
echo "nameserver 8.8.8.8
nameserver 8.8.4.4" > /etc/resolv.conf

# makeflags; 8 cores
echo "export MAKEFLAGS='-j 8'" >> /etc/bash.bashrc

The first command tells sudo to allow anyone in the wheel group to be able to run root commands. What's with the wheel name? SOF has an answer/theory for that one.

The only other command that probably requires commentary is the MAKEFLAGS line. This tells gcc the system how many cores to use. Many installations require gcc, including modules installed with Python's pip. So, don't think you're off the hook.

Almost done. Exit SSH. Reconnect. Login as the new user.

If that works, you can disable root for SSH.

Don't skip this step.

I don't like to prefix all my commands with sudo, so I just run bash as root:

sudo bash

The following command will remove the ssh-as-root line:

sed -i '/PermitRootLogin yes/d' /etc/ssh/sshd_config

For more info on the awesome tool called sed, there's a great compilation called USEFUL ONE-LINE SCRIPTS FOR SED that you should definitely bookmark. Also, visit the parent site. It's mostly art nonsense, but the docs section has good things.

Then restart OpenSSH:

systemctl restart sshd

At this point I like to install a bunch of other stuff. We do this with pacman. This is like apt-get in the Debian world. Actually, it's more like a merger of apt-get, apt-cache, and other things.

Here's what's in my usual toolbox:

pacman -S mlocate wget gcc openssh make mercurial git bc curl vim dnsutils whois net-tools lsof --noconfirm 

Some of that is already installed (e.g. openssh), but nobody cares. The list is more for my memory than for pacman.

Networking (dhcp)

The main thing left to do is to get DHCP to run automatically on boot.

Let's see what our interfaces are:

ip link

You could also use the following to get your interfaces:

ls /sys/class/net

You'll find that various parts of the file-system aren't really files, but very similar to the Windows WMI interfaces.

I've lo and eth0, yours might be enp3s0.

Now to enable DHCP for that interface:

systemctl enable dhcpcd@eth0.service

You could simply do this for everything, but let's be specific (you might want to play with static on a different interface).

systemctl enable dhcpcd.service

Networking (static)

Let's try static IPs now. It's actually a bit easier than you read in the docs:

echo "Description='A basic static ethernet connection'
Interface=eth0
Connection=ethernet
IP=static
Address=('10.1.211.10/16' '10.1.211.11/16' '10.1.211.12/16' '10.1.211.13/16')
Gateway='10.1.1.254'
" > /etc/netctl/home-static

netctl enable home-static

1
RedHat is the new Windows for many people; e.g. enterprise-class $$. CentOS (think of it as a free version of RedHat), though, is pretty sweet. I'd recommend CentOS for production servers.
2
You might not need base-devel. Most of it is already in base. Whatever.
3
You might see pacstrap -i /mnt base base-devel linux in some docs. The linux package is in the base package. If you ran the command with "base base-devel", then ran it with "linux", you'd see a 0MB net upgrade.

Stop calling everything REST

Everyone needs to read the following again:

I'll be concise. We're all busy. Also, I'm lazy.

Not at the highest level? It's not REST. Stop calling it REST. It's probably web CRUD and RPC.

Use of Microsoft WebAPI does not christen thee "REST". Swagger does not transmute an API to REST1. Most importantly, calling it REST does not make it REST. Speech-act theory won't help you here. If it's not REST, calling it REST does nothing more than make you wrong.

Regarding Richardson Maturity Model and Roy Fielding's thesis (and blog), the steps are not levels of REST. These are steps to REST.

The journey is not REST. The destination is REST.

  • No hypermedia => No REST.

  • No HATEOAS => No REST.

Note the periods at the end of the previous two lines. They are key. PERIOD.

You might be RESTsome, but you aren't RESTful.

You might be RESTistic, but you aren't RESTastic. 2

Important endnote: this is about what you call REST. This doesn't mean REST is the answer. No document written by a single individual is binding de jure nor de facto. Checks and balances are required. There is no committee REST spec. In that regard, REST is fiction. Yet, I can't call a pizza a Smurf just because I feel like it.


1
In fact, using Swagger may actually prevent you from making a REST service.
2
In the sense that your image in the mirror is human-istic, but it doesn't have some human. It has NO human. Is is NO human.

Powered by
Python / Django / Redis / Elasticsearch / Azure Blob Storage / Nginx / Linux

Mini-icons are part of the Silk Icons set of icons at famfamfam.com