2005 2006 2007 2008 2009 2010 2011 2015 2016 aspnet csharp debugging exceptions firefox javascriptajax linux llblgen powershell projects python security services silverlight training videos wcf wpf xag xhtmlcss

Web API: Standing on the Shoulder of Giants

For a long time now, I've been posting articles in what I "internally" call the "from first principles" series. Many, many years ago, I posted a very long article called JavaScript Graphics Development. Years later, I wrote an article on WCF. Last year, I wrote one on Elasticsearch.

A few months ago I started an ambitious attempt to exhaustively write on one of my favorite tools: ASP.NET Web API. However, I didn't get too far before realizing that there just wasn't enough to write. The vast majority of topics were prerequisites, not directly related to Web API proper. I realized that the tall stature of Web API came from the fact that it was standing on the shoulders of many giants. Of course, I've known this for years, but writing something forces you into a different perspective.

Self-Examination

What are these giants? Let's proceed by examining your Web API expertise:

Do you have a strong grasp of web resource API representation design? That is, are you good with the following?

/person

/person/1

/person/newest

/person&version=1.5

So, are you solid with this? If so, you've expertise in web resource API representation design. This LONG predated Web API. It has nothing to do with Web API. Please stop giving Microsoft credit for it.

Next, are you solid with HTTP verbs? That is, are you good with the following?

GET /person

PUT /person/1

DELETE /person/newest

GET /person&version=1.5

If you're solid with this, you've proficiency in dealing with HTTP verbs. This predates Web API by many decades. Many of us have been working with verb-based APIs directly decades before Web API was even proposed.

No, you don't need to be an expert in PUT vs. POST. Don't feel bad for confusing them. You are, however, required to follow the rules. Children look to what's possible (e.g. a body in a GET request), adults look to what's right. When you write your initial unit tests, you might make a note about when PUT and POST are supposed to throw an errors. Personally, I never remember, but I have to follow the industry rules.

Now, how are you with interacting with Web APIs directly (e.g. curl-only)?

That is, is this clear as day or are you confused by the lack of "models" or a GUI?

curl -i -XPOST -d 'grant_type=password&username=myusername&password=password' https://example.com/token

In reality, this has nothing to do with Web API. Yes, it's the most important way of testing and scripting Web API interaction (you can't write a full app for every HTTP call!), but it simply means you know HTTP and curl. It has nothing directly to do with Web API at all.

Yes, this is important: you cannot design Web APIs around mere "models" or serialization. As with WCF, there are no hierarchies or "types", there are only messages that adhere to external interfaces. Model/serialization thinking leads to to Microsoft-centric designs, which defeats the entire point of the neutrality of web (can we please stop with the Microsoft-centric PascalCasing of everything??).

Knowing this is knowing messages and payloads, not the implementation details of Web API.

Next, how are you with virtualizing web routes? That is, do you have a strong grasp of the difference between accessing /contact.aspx?dept=hr and /contact/hr?

If so, then you simply understand URL handling done in this millenium. This has nothing to do with Web API. Many of us have been virtualizing our URLs for the longest time. My [now long gone] Themelia platform had a concept of virtual web containers and path virtualization long before Web API and ASP.NET MVC. It's also par for the course in Django and every other web development system out there. Even ASP.NET WebForms could use the routing system shipped with ASP.NET MVC. This doesn't directly add to your Web API expertise.

Next, how are you with Web API authentication and authorization?

Confused? If so, then you may have caught my trap. You add your own OAuth2 / OpenID connect functionality to Web API. That's part of the beauty of modularity. You either whip out some of the Nuget packages and do some coding or whip out IdentityServer (also on Nuget) and let it do the heavy lifting for you (recommended).

Finally, how are you with Web API middleware?

Kidding. That's a trick question. There's no such thing. It's OWIN middleware. It has absolutely nothing to do with Web API. Having expertise in middleware has nothing to do with expertise in Web API.

Definition

So, what the heck is Web API and Web API expertise? It's not the routes, it's not the resources, it's not the verbs, it's not the security model, it's not the middleware. What is it?

Web API is fundamentally a quick-and-dirty mechanism to listen for resources and respond to verbs using a Controller-based architecture (mimicking ASP.NET MVC's controllers). It's this simplicity of Web API that makes it incredibly elegant and attractive to use. That said, claiming expertise in it is like claiming to have a Ph.D. in subtraction. This fictional person's Ph.D. isn't in subtraction, it's in Mathematics.

Even the stuff we often associate with Web API like IoC, testability, and the flexible hosting models have nothing to do with Web API. Web API simply uses them. That's called having good architecture.

No, Web API is not a replacement for WCF. That's absolutely laughable. Good luck using your Web API controllers to implement WS-reliability and WS-transactions, interact with SAML federated security, or expose endpoints as raw TCP. Go ahead. As a long-time service-oriented architect and WCF expert, I'm often asked how Web API competes with it. Answer: it doesn't. However, it did make System.ServiceModel.Web die, but that thing never deserved to live.

The one aspect where you'll get more mileage on "I'm a Web API expert" is with responding-to-verbs. Have you written your own media formatter / content negotiation? Very cool-- you probably built something that should be shared on GitHub. We need more people writing these.

Now,Have you worked with not only the typical JSON/XML formats, but OData as well. That's pretty cool, but, OData is more of something that is added to Web API, than properly part of it. See (https://github.com/OData)[https://github.com/OData] for details.

Aside from a few other very trivial things, and it being a fancy marketing buzzword, there's nothing in Web API with enough content to warrant making it on your resume.

You and Web API

Instead, you might be a full-scale service expert. You might actually have much more skill than you think. If you claim to be a Web API expert, take a day and try to learn Flask API. If you can pick it up quickly, you're probably more than an expert in one specific tool. You have a more solid, foundational grasp of the underlying structure. This is what's important. Being able to use a menu to create a Web API from a project template is not.

Web API is such a simple tool that if you understand the prerequisites (resources, verbs, OWIN, etc) it should take your less than a day to master. I'm fully convinced that the core of it was written in a day (heck, I wrote the entire resource aspect of my pywebapi project in a day!) Microsoft has been masterfully abstracting various pieces of ASP.NET for years to make efficient development of tools like Web API possible.

No, my Python Web API isn't as feature rich as Microsoft Web API. The reason is simple and goes to the current point: I didn't have the shoulders to stand on. For example, I had to write my own middleware "framework" (accessible via pip install middleware). Python is far more feature rich than .NET, but these giants which Web API stand on are things that are above and beyond bare .NET.

Web API is one of my top 5 favorite tools (others are Python, Nginx, Elasticsearch, and Git). It's simplicity and elegance do NOT replace my WCF services (see above note), but DO replace myriad HTTP Handlers that were supporting my single-page-apps.

Not Optional

If you understand the giants on whose shoulders Web API stands, then you're 90% into Web API already. Every .NET developer needs to know those giants. OWIN, for example, is not an optional skill. Putting Web API on your resume for any reason other than a marketing buzzword is truly silly. It's like putting LINQ or OWIN on your resume. LINQ is simply C#. You may as well put "the dynamic keyword" on your resume. OWIN is the current (2016) method for building web apps in .NET. Because of this, Web API fundamentally a basic skill. It's as part of the framework as these other things. It takes its place on your resume only as a buzzword to make it through HR screening, but the technical skills are wrapped up in your .NET skill set as whole. You need to know that anyway.

Finally, Web API is not one of the major specialist platforms within .NET (e.g. ASP.NET webforms, WPF, winforms). Given the host-it-anywhere-you-want model, it's a basic skill for people who spend their time in WPF as well. As a long time exam technical editor, I've rejected a lot of questions that were unfair. If you try to put a ASP.NET question on a C# exam, I'll flag it for removal. The WPF developers won't have a clue, and will get that question wrong. You try to put a Winform (think 2005!) question on the same exam? Again, I'll flag it. Put a Web API question on it? No problem any more. It's now a basic skill. The ability to handle HTTP traffic is a great addition to any architecture, solution, or application. It's not at all surprising to see management functionality within desktop applications that internally run Web API.

In summary,

  • Web API stands of the shoulder of so many giants that it itself seems large. In reality, it's a tiny, elegant tool that drives much of modern service development.
  • You probably have much more skill than you think. Your expertise probably goes far beyond the tiny tool known as Web API. You might have full SOA expertise.
  • Web API is not an optional skill or extra "pillar" technology like WPF, ASP.NET, or winforms. Today, it and the concepts it relies upon are as basic as the C# language itself.

LCSA Exam Tips

If you're going for the Linux on Azure certification, you'll be taking the LCSA exam. This is a practicum, so you're going to be going through a series of requirements that you'll have to implement.

Ignore the naive folk who say that this is a real exam, "unlike those that simply require to you memorize a bunch of stuff". The problem today is NOT with people having too much book knowledge, but nowhere near enough. Good for you for going for a practicum exam. Now study for the cross-distro LPIC exams so you don't get tunnel vision in your own little world. Those exams will expand your horizons into areas you may not have known about. Remember: it's easy to fool yourself into thinking that you have skill when you do the same thing every day. Perhaps your LCSA exam will simply match your day job and you'll think you're simply hardcore. Go study the books and get your humility with the written exams. You know, the ones where you can't simply type man every time you have a problem.

Here are some tips:

noclobber

In whatever account (e.g. some user or root) you'll be using, set the following:

set -o noclobber

This will make it so that if you accidentally use > instead of >>, you won't automatically fail. It will simply prevent overwriting some critical file.

For example, if you're trying to send a new line to /etc/fstab, you may want to do the following:

echo `blkid | grep sdb1`    /mng/taco    xfs    defaults    0 0 >> /etc/fstab

Yeah, you'll need to edit the UUID format after that, but regardless: the >> means append. If you accidentally use >, you're dead. You've already failed the exam. The system needs to boot. No fstab => no boot.

/root/notes

The requirements you'll be given will, like with all requirements, they require careful thought and interpretation. Your thought process at may be more clear later. This is why giving estimates on the job RIGHT THEN AND THERE is what we technically call "full retard". In this exam, you've got 2 hours. It's best not to sit there and dwell. Go onto other things and come back.

In this process, you'll want to store your mental process, but you can't write anything down and there's also no in-exam notepad. So, I like to throw notes in some random file.

echo review ulimits again >> /root/notes

Again, if you accidentally used > instead of >>, you'd overwrite your notes.

The forward/backward buttons in the exams are truly horrible. It's like trying to find a scene in a VHS. There's no seek, you have to scroll all the way through the questions to get back to what you want to see. Just write the question in your notes.

Using command comments

If you're in the middle of a command and you realize "uhhh.... I have no idea how to do this next part", you may want to hit the manpages.

BUT! What about the line you're currently on? Just throw # in front of it and hit enter. It will go into your history so you can pick it up later to finish.

Example:

$ #chmod 02660 /etc/s

You can't remember the rest of the path, do you have to look it up. CTRL-A to the begining of the line and put #. Hitting enter will put it into history without an error.

Consider using sed

Your initial thought may be "hmm vi!" That's usually a good bet for any question, but you better know sed. With sed you can do something like this:

sed "/^#/d" /etc/frank/data.txt 

This will delete all lines that start with #, but it will only send the output to YOU, it won't actually update the file.

If you like what you see, do this:

sed -i.original "/^#/d" /etc/frank/data.txt

This will update the file, but save the original to data.txt.original

Remember the absolute basics of awk

Always remember at least the following, this single command accounts for the 80/20 of all my awk usage:

awk '{print $2}'

Example:

ls -l /etc/hosts* | awk '{ print $9 }'

Output:

/etc/hosts
/etc/hosts.allow
/etc/hosts.deny

Probably the worst example in the world, but the point is: you get the 9th column of the output. Pipe that to whatever and move on.

Need two columns?

ls -l /etc/hosts* | awk '{ print $5, $9 }'

Output:

338 /etc/hosts
370 /etc/hosts.allow
460 /etc/hosts.den

Know vi

If you don't know vi, you aren't going to even take the LFCS exam. It's a moot point. The shortcuts, text editing capabilities, and absolute ubiquity of vi makes it something you do not have the option to ignore.

Know find

Believe me, the following command pattern will save you in the exam and on the job:

find . -name "*.txt" -exec cp {} /tmp\;

The stuff after -exec runs once per file found. Instead of doing stuff one-at-a-time, you can use find to process a bunch of stuff at once. The {} represents the file. So, this is copy each file found to /tmp.

Know head / tail (and all the other standard tools!)

If you're like me, when you're in a test, you could read "How many moons does Earth have?" and you'll quickly doubt yourself. Aside from the fact that Steven Fry on QI sometimes says Earth has two moons, and other times says Earth has none, the point is that it's easy to forget everything you think is obvious.

So, if you need to do something with certain files in a list and can't remember for the life of you how you to deal with files 90-130, perhaps you do this:

ls -l | head -n130 | tail -n40

Then do some type of awk to get the file name and do whatever.

That's one thing that's easy about this exam: you can forget your own name and still stumble through it since only the end result is graded.

Know for

I love for. I use this constantly. For example:

for n in `seq 10`; do touch $n; done

That just made 10 files.

Need to create 10 5MB files?

for n in `seq 10`; do dd if=/dev/zero of=/tmp/$n.img bs=1M count=5; done

I love using that to test synchronization.

Know how to login as other users

If you're asked to do something with rights, you'll probably want to jump over to that user to test what you've done.

su - dbetz

As root, that will get me into the dbetz account. Once in there I can make sure the rights I supposedly assigned are applied properly.

Not only that, but if I'm playing with sudo, I can go from root to dbetz then try to sudo from there.

Know how to figure stuff out

Obviously, there's the man pages. I'm convinced these exist primarily so random complete jerks on the Internet can tell you to read the manual. They are often a great reference, but they are just that: a reference. Nobody sits down and reads them like a novel. Often, they are so cryptic, that you're stuck.

So, have alternate ways of figuring stuff out. For example, you might want to know where other docs are:

find / -name "*share*" | grep docs

That's a wildly hacky way to find some docs, but that's the point. Just start throwing searches out there.

You'll also want to remember the mere existence of various files. For example, /etc/bashrc and /etc/profile. Which calls which? Just open them and look. This isn't a written test where you have to actually know this stuff. The system itself makes it an open book exam. For the LPIC exams, you need to know this upfront.

Running with Nginx

Stacks don't exist. As soon as you change your database you're no longer LAMP or MEAN. Drop the term. Even then, term only applies to technology; it doesn't describe you. If you are a "Windows guy", learn Linux. If you are a "LAMP" dude, you have to at least have some clue about .NET. Don't marry yourself to only AWS or Azure. Learn both. Use both. Some features of Azure make me drool, while others remind me of VB6. Some features of AWS make me feel like a kid in a candy store, while others make me wonder if they are actually April Fool's jokes.

Regardless of whatever you're into, you really should learn this epic tool called Nginx. I've been using it for a while and now have almost all my web sites touching it in some way.

So, what is it? The marketing says it's a "reverse proxy". Normally I mock marketing nonsense, but this description is actually pretty good. You could call Nginx a web server, but that misses the point. It can act as a web server, and I'm sure some millennial somewhere is saying "LOL!! IT SERVZ WEB CONTENTZ! LOL!!" because it can "serve files", but it's mainly a reverse proxy.

Adding SSL and Authentication

I'm going to start off with a classic example: adding SSL and username / password authentication to an existing web API.

Elasticsearch is one of my favorite database systems; yet, it doesn't have native support for SSL or authorization. There's a tool called Shield for that, but it's over kill when I don't care about multiple users. Nginx came to the rescue. Below is my basic Nginx config. You should be able to look at the following config to get an idea of what's going on. Of course, I'll add some commentary.

server {
    listen 10.1.60.3;

    auth_basic "ElasticSearch";
    auth_basic_user_file /etc/nginx/es-password;

    location / {
        proxy_pass http://127.0.0.1:9200;
        proxy_http_version 1.1;
        proxy_set_header Connection "Keep-Alive";
        proxy_set_header Proxy-Connection "Keep-Alive";
    }
}

In this example, I have a listener setup to listen on port 443. In the context of this listener, I'm setting configuration for /. I'm passing all traffic on to port 9200. This port is only bound locally, so HTTP isn't even publicly accessible. You can also see I'm setting some optional headers.

443 is SSL, so I have my SSL cert and SSL key configured (in my real config, there's a lot more SSL config; just stuff to configure the ciphers).

Finally, you can see that I've setup basic user authentication. Prior to creating this config I used the Apache htaccess command to create a password file:

sudo htpasswd -c /srv/es-htpasswd searchuser

If you stare at the config enough, it will be demystified. Nginx is simply adding SSL and username/password auth to an existing working, open HTTP-only server.

SSL Redirect

Let's lighten up a bit with a simpler example...

There are myriad ways to redirect from HTTP to HTTPS. Nginx is my new favorite way:

server {
    listen 222.222.222.222:80;

    server_name mydomain.net;
    server_name www.mydomain.net;

    return 301 https://mydomain.net$request_uri;
}

Accessing localhost only services

The other day I needed to download some files from my Google Drive to my Linux Server. rclone seemed to be an OK way to do that. During setup, it wanted me to go through the OpenID/OAuth stuff to give it access. Good stuff, but...

If your browser doesn't open automatically go to the following link: http://127.0.0.1:53682/auth
Log in and authorize rclone for access
Waiting for code...
Got code

Uhh... 127.0.0.1? Dude, that's a remote server. I tried to go there with the text-based Lynx browser, but, wow... THAT. WAS. HORRIBLE. Then I had a random realization: Nginx! Here's what I did real quick:

server {
    listen 111.111.111.111:8080;

    location / {
        proxy_pass http://127.0.0.1:53682;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header Host 127.0.0.1;
    }
}

Then I could access the server in my own browser using http://111.111.111.111:53682/auth.

BOOM! I got the Google authorization screen right away and everything came together.

Making services local only

This brings up an intersting point: what if you had a public service you didn't want to be public, but didn't have a way to do it-- or, perhaps, you just wanted to change the port?

In a situation where I had to cheat, I'd cheat by telling iptables (Linux firewall) to block that port, then use Nginx to open the new one.

For example:

iptables -A INPUT -i lo -j ACCEPT
iptables -A INPUT -p tcp --dport 8080 -j ACCEPT
iptables -A INPUT -j DROP

This says: allow localhost and stuff to port 8080, but block everything else.

If you do this, you need to save the rules using something like iptables-save > /etc/iptables/rules.v4. On Ubuntu, you can get this via apt-get install iptables-persistent.

Then, you can do something like the previous Nginx example to take traffic from a differnet port.

File Serving

My new architecture for my websites involves a few components: public Azure Blob Storage for my assets, ASP.NET WebAPI for all backend processing, and Python/Django for all my websites (+Elasticsearch for queries and Redis is always preloaded with a full mirror of my database) . My netfxharmonics.com follows this exact architecture. I don't like my websites existing in the same world as anything that serves content to it. The architecture I've promoted for years finally has a name: microservices (thank goodness for a non-lame name! *cough* AJAX *cough*) I take a clean architectural approach: no assets on my website, no database access on my website, and no backend processing on my website. My websites only displays content. Databases (thus all connection strings) are behind my WebAPI wall.

OK... I said no assets. That's not completely true, which brings us to the point: how do I serve robots.txt and favicon.ico if I don't allow local assets? Answer: Nginx.

location /robots.txt {
    alias /srv/robots.txt;
}

location /favicon.ico {
    alias /srv/netfx/netfxdjango/static/favicon.ico;
}

Azure

So, you've got a free/shared Azure Web App. You've got you free hosting, free subdomain, and even free SSL. Now you want your own domain and your own SSL. What do you do? Throw money at it? Uh... no. Well, assuming you were proactive and keep a Linux server around.

This is actually a true story of how I run some of my websites. You only get so much free bandwidth and computing with the free Azure Web Apps, so you have to be careful. The trick to being careful is Varnish.

The marketing for Varnish says it's a caching server. As with all marketing, they're trying to make something sound less cool than it really it (though that's never their goal). Varnish can be a load-balancer or something to handle fail-over as well. In this case, yeah, it's a caching server.

Basically: I tell Varnish to listen to port 8080 on localhost. It will take traffic and provide responses. If it needs something, it will go back to the source server to get the content. Most hits to the server will be handled with Varnish. Azure breathe easy.

Because the Varnish config is rather verbose and because it's only tangentially related to this topic, I really don't want to dump a huge Varnish config here. So, I'll give snippets:

backend mydomain {
    .host = "mydomain.azurewebsites.net";
    .port = "80";
    .probe = {
         .interval = 300s;
         .timeout = 60 s;
         .window = 5;
         .threshold = 3;
    }
  .connect_timeout = 50s;
  .first_byte_timeout = 100s;
  .between_bytes_timeout = 100s;
}

sub vcl_recv {
    #++ more here
    if (req.http.host == "123.123.123.123" || req.http.host == "www.mydomain.net" || req.http.host == "mydomain.net") {
        set req.http.host = "mydomain.azurewebsites.net";
        set req.backend = mydomain;
        return (lookup);
    }
    #++ more here
}

This won't make much sense without the Nginx peice:

server {
        listen 123.123.123.123:443 ssl;

        server_name mydomain.net;
        server_name www.mydomain.net;
        ssl_certificate /srv/cert/mydomain.net.crt;
        ssl_certificate_key /srv/cert/mydomain.net.key;

        location / {
            proxy_pass http://127.0.0.1:8080;
            proxy_set_header X-Real-IP  $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto https;
            proxy_set_header X-Forwarded-Port 443;
            proxy_set_header Host mydomain.azurewebsites.net;
        }
}

Here's what to look for in this:

proxy_set_header Host mydomain.azurewebsites.net

Nginx sets up a listener for SSL on the public IP. It will send requests to localhost:8080.

On the way, it will make sure the Host header says "mydomain.azurewebsites.net". This does two things:

* First, Varnish will be able to detect that and send it to the proper backend configuration (above it).

* Second, Azure will give you a website based on the `Host` header. That needs to be right. That one line is the difference between getting your correct website or getting the standard Azure website template.

In this example, Varnish is checking the host because Varnish is handling multiple IP addresses, multiple hosts, and caching for multiple Azure websites. If you have only one, then these Varnish checks are superfluous.

Verb Filter

Back to Elasticsearch...

It uses various HTTP verbs to get the job done. You can POST, PUT, and to insert, update, or delete respectively, or you can use GET to do your searches. How about a security model where I only allow searches?

It might be a poorman's method, but it works:

server {
    listen 222.222.222.222:80;

    location / {
        limit_except GET {
            deny all;
        }
        proxy_pass http://127.0.0.1:9200;
        proxy_http_version 1.1;
        proxy_set_header Connection "Keep-Alive";
        proxy_set_header Proxy-Connection "Keep-Alive";
    }
}

Verb Filter (advanced)

When using Elasticsearch, you have the option of accessing your data directly without the need for a server-side anything. In face, your AngularJS (or whatever) applications can get data directly from ES. How? It's just an HTTP endpoint.

But, what about updating data? Surely you need some type of .NET/Python bridge to handle security, right? Nah.

Checkout the following location blocks:

location ~ /_count {
    proxy_pass http://elastic;
    proxy_http_version 1.1;
    proxy_set_header Connection "Keep-Alive";
    proxy_set_header Proxy-Connection "Keep-Alive";
}

location ~ /_search {
    proxy_pass http://elastic;
    proxy_http_version 1.1;
    proxy_set_header Connection "Keep-Alive";
    proxy_set_header Proxy-Connection "Keep-Alive";
}

location ~ /_ {
    limit_except OPTIONS {
        auth_basic "Restricted Access";
        auth_basic_user_file /srv/es-password;
    }

    proxy_pass http://elastic;
    proxy_http_version 1.1;
    proxy_set_header Connection "Keep-Alive";
    proxy_set_header Proxy-Connection "Keep-Alive";
}

location / {
    limit_except GET HEAD {
        auth_basic "Restricted Access";
        auth_basic_user_file /srv/es-password;
    }

    proxy_pass http://elastic;
    proxy_http_version 1.1;
    proxy_set_header Connection "Keep-Alive";
    proxy_set_header Proxy-Connection "Keep-Alive";
}

Here I'm saying: you can access anything with _count (this is how you get counts from ES), and anything with _search (this is how you query), but if you are accessing something else containing an underscore, you need to provide creds (unless it's an OPTION, which allows CORS to work). Finally, if you're accessing / directly, you can send GET and HEAD, but you need creds to do a POST, PUT, or DELETE.

You can add credential handling to your AngularJS/JavaScript application by sending creds via https://username:password@mydomain.net.

It works fine. Now you can throw away all your server-side code an stick with raw AngularJS (or whatver). If something requires a preprocessor, postprocessor, or server-side code at all (e.g. couldn't be developed in jdfiddle/plunkr directly), it's not web development (and you might not be a web developer). Here, you have solid, direct web development without the middle-man. Just the browser and the server-infrastructure. It's SPA with your own IAAS setup.

Killing 1990s "www."

Nobody types "www.", it's never on business cards, nobody says it, and most people forgot it exists. Why? This isn't 1997. The most important part of getting a pretty URL is removing this nonsense. Nginx to the rescue:

server {
    listen 222.222.222.222:80

    server_name mydomain.net;
    server_name www.mydomain.net;

    return 301 https://mydomain.net$request_uri;
}

server {
    listen 222.222.222.222:443 ssl http2;

    server_name www.mydomain.net;

    # ... ssl stuff here ...

    return 301 https://mydomain.net$request_uri;
}

server {
    listen 222.222.222.222:443 ssl http2;

    server_name mydomain.net;

    # ... handle here ...
}

All three server blocks listen on the same IP, but the first listens on port 80 to redirect to the actual domain (there's no such thing as a "naked domain"-- it's just the domain; "www." is a subdomain), the second listens for the "www." subdomain on the HTTPS port (in this case using HTTP2), and the third is where everyone is being directed.

SSL

This example simply expands the previous one by showing the actual SSL implemenation. Keep in mind that to use HTTP2, you have to have at least Nginx 1.9 (at the time of writing, this meant compiling it yourself-- not a big deal).

server {
    listen 222.222.222.222:80;

    server_name mydomain.net;
    server_name www.mydomain.net;

    return 301 https://mydomain.net$request_uri;
}

server {
    listen 222.222.222.222:443 ssl http2;

    server_name www.mydomain.net;

    ssl_certificate /srv/_cert/mydomain/mydomain.net.chained.crt;
    ssl_certificate_key /srv/_cert/mydomain/mydomain.net.key;

    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;

    ssl_ciphers 'ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA';

    ssl_prefer_server_ciphers on;

    ssl_dhparam /srv/_cert/dhparam.pem;

    return 301 https://mydomain.net$request_uri;
}

server {
    listen 222.222.222.222:443 ssl http2;

    server_name mydomain.net;

    ssl_certificate /srv/_cert/mydomain/mydomain.net.chained.crt;
    ssl_certificate_key /srv/_cert/mydomain/mydomain.net.key;

    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;

    ssl_ciphers 'ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA';

    ssl_prefer_server_ciphers on;

    ssl_dhparam /srv/_cert/dhparam.pem;

    location / {
        add_header Strict-Transport-Security max-age=15552000;
        add_header Content-Security-Policy "default-src 'none'; font-src fonts.gstatic.com; frame-src accounts.google.com apis.google.com platform.twitter.com; img-src syndication.twitter.com bible.logos.com www.google-analytics.com 'self'; script-src api.reftagger.com apis.google.com platform.twitter.com 'self' 'unsafe-eval' 'unsafe-inline' www.google.com www.google-analytics.com; style-src fonts.googleapis.com 'self' 'unsafe-inline' www.google.com ajax.googleapis.com; connect-src search.jampad.net jampadcdn.blob.core.windows.net mydomain.net";

        include         uwsgi_params;
        uwsgi_pass      unix:///srv/mydomain/mydomaindjango/content.sock;

        proxy_set_header X-Real-IP  $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto https;
        proxy_set_header X-Forwarded-Port 443;
        proxy_set_header Host mydomain.net;
    }
}

The certs that I use require the chain certs to get a solid A rating on ssllabs.com, this is just a matter of merging your cert with the chain cert (just Google it):

cat mydomain.net.crt ../positivessl.ca-bundle > mydomain.net.chained.crt

Verb Routing

Speaking of verbs, you could whip out a pretty cool CQRS infrastructure by splitting GET from POST.

This is more of a play-along than a visual-aide. You can actually try this one at home.

Here's a demo using a quick node server:

http = require('http');
port = parseInt(process.argv[2]);
server = http.createServer( function(req, res) {
    res.writeHead(200, {'Content-Type': 'text/html'});
    res.end(req.method + ' server ' + port);
});
host = '127.0.0.1';
server.listen(port, host);

Here's our nginx config:

server {
    listen 222.222.222.222:8192;

    location / {
        limit_except GET {
            proxy_pass http://127.0.0.1:6001;
        }
        proxy_pass http://127.0.0.1:6002;
    }
}

use nginx -s reload to quickly reload config without doing a full service restart

Now, to spin up two of them:

node server.js 6001 &
node server.js 6002 &

& runs something as a background process

Now to call them (PowerShell and curl examples provided)...

(wget -method Post http://192.157.251.122:8192/).content

curl -XPOST http://192.157.251.122:8192/

Output:

POST server 6001
(wget -method Get http://192.157.251.122:8192/).content

curl -XGET http://192.157.251.122:8192/

Output:

GET server 6002

Cancel background tasks with fg then CTRL-C. Do this twice to kill both servers.

There we go, your inserts go to one location you read from a different one.

Python / Django

Another great thing about Nginx is that it's not Apache. Aside from Apache simply trying to do far too much, it's an obsolete product from the 90s that needs to be dropped. Even then, using Apache for your Python/Django development isn't pleasant, that's what the runserver is for.

If you're using runserver for development and Apache on production, you have two completely different hosting systems. Nginx allows you to have the hosting setup in both places. The idea is this: in dev, you'll use runserver and in prod, you'll use uwsgi. But, isn't that still two different setups? Not really, because these are the things that run the Python content, Nginx will be what serves the content in both locations. Thus, you not only have a unified hosting model, but you have Python processing done by a proper python processer and HTTP traffic handled by Nginx.

Raw Python HTTP Processing

Python HTTP processing (it's not "web development" unless there's a web-browser) is all about WSGI: web software gateway interface. It's a pointless term, but the implementation is beautiful: it's a single interface that handles everything web-related for Python. The signature is as follows (with an example):

def name_does_not_matter(environment, response_code):
    response_code = '200 OK'
    return 'Your content type was {}'.format(environment['CONTENT_TYPE'])

This is even what Django does deep down.

You can use a service like UWSGI to do the processing for this. Like other things in Linux, this tool does one thing, does it well, and relies on other tools for other things. In the case of hosting, Nginx is a solid way to handle the HTTP hosting for UWSGI.

In addition to the config for UWSGI (not shown-- not relevant!), you have the following Nginx config:

server {
    listen 222.222.222.222:80;

    location / {
        include            uwsgi_params;
        uwsgi_pass         unix:/srv/raw_python_awesomeness/content/content.sock;

        proxy_redirect     off;
        proxy_set_header   Host $host;
        proxy_set_header   X-Real-IP $remote_addr;
        proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header   X-Forwarded-Host $server_name;
    }
}

You could make UWSGI serve up something on localhost:8081 (or whatever port you want), but it's best to use sockets where you can.

You can see my WebAPI for Python project at https://github.com/davidbetz/webapipy for a fuller example.

Bulk Download in Linux

Want to download a huge list of files on Linux? No problem...

Let's get a sample file list:

wget http://www.linuxfromscratch.org/lfs/view/stable/wget-list

Now, let's download them all:

sed 's/^/wget /e' wget-list

This says: execute wget for each line

Done.

Learning Elasticsearch with PowerShell

I'm not big on marketing introductions. They are usually filled with non-technical pseudo-truths and gibberish worthy of the "As seen on TV" warning label. So, what is Elasticsearch ("ES")? Marketing says it's a search system. People who have used it say it's a hardcore Aristotelian database that makes for a fine primary datastore as well as for a fine search engine.

It's comparable to MongoDB in the sense that you throw JSON objects a it. One of the major differences with MongoDB is that Elastic is more explicit about its indexing. Every database does indexing and everything has a schema. Have a data-structure? You have an index. Have fields? At a minimum, you have an implicit schema. This is what makes an Aristotelian-system Aristotelian.

See my video on Platonic and Aristotelian Data Philosophies for more information on why "NoSQL" is a modern marketing fiction similar to "AJAX".

Another major difference is that Elastic scales much better than MongoDB (see here for more details). This makes sense given the schema-focus of Elastic and the fact that it's meant for large-scale searching.

You might find people say that Elastic is schemaless. These people have neither read nor peeked at the docs. Elastic is very explicit about it's indexes. Sometimes you'll hear that it's schemaless because it uses Lucene, which is schemaless. That's stupid. Lucene uses straight bytes and Elastic adds the schema on top of it. Your file system uses bytes and SQL Server adds a schema on top of it. Just because your file system uses bytes, not a schema, this doesn't mean that SQL Server doesn't have a schema because it uses MDF files on a a file system using bytes. SQL Server has a schema. Elastic has a schema. Everything has a schema. It might be implicit, but it has a schema. Even if you never create one, there is a schema. Shutting the hood of your car doesn't mean the engine doesn't exist. You need to think beyond the capacities of a 1-year old. Elastic is explicit about having a schema.

Yet another difference is in terms of access: because Mongo uses TCP and ES uses HTTP, for all practical purposes, Mongo requires a library, but an ES library is redundant. When something is as dynamic as ES, using strongly-typed objects in .NET makes .NET fodder for ridicule. .NET developers with an old school mindset (read: inability to paradigm shift) folk in particular have an unhealthy attachment to make everything they touch into some strongly-typed abstraction layer. THIS. IS. WHAT. IS. WRONG. WITH. MONGO. This is also everything right about the dynamic keyword in C#. It's good to see C# get with the modern world.

OK OK, Mongo is fine, but I just wish it had an HTTP interface which we could call directly. Having said that, the stereotype of a .NET dude is this: the dude is given a perfectly good HTTP endpoint, he then proceeds to express his misguided cleverness by destroying the open nature of the endpoint by wrapping it in yet another API you have to learn. You can simply look at Nuget to see the massive number of pointless abstraction layers everyone feels the need to dump onto the world. To make matters worst, when you want to make a simple call, you're forced to defeat the entire point of the clean API by using the pointless abstraction layer. Fail. Abstraction layers are fun to write to learn an API, but... dude... keep them to yourself. Go simple. Go raw. Nobody wants abstraction layer complexity analagous to WebForms; there's a reason Web API is so popular now. Don't ruin it. This lesson is for learning, not to create yet another pointless abstraction layer. Live and love dynamic programming. Welcome to 2015.

This is most likely why REST is little more than a colloqualism. People who attempt to use hypermedia (a requirement for REST) either find it pointlessly complicated or impossible to wrap. REST is dead; we now use proper term "REST" to mean any HTTP verb-driven, resource-based API. ES is not REST, it's "REST" only in this latter sense.

This leads to the point of this entire document: you will learn to use ES from first-principles; you will be able to write queries as wrapped or unwrapped as you want.

My game plan here will seem absolutely backward:

  • Abstract some lower-level ES functionality with PowerShell using search as an example.
  • Discuss ES theory, index setup, and data inserting.

Given that setup is a one-time thing and day-to-day stuff is... uh... daily, the day-to-day stuff comes first. The first-things-first fallacy can die.

As I demonstrating using ES from PowerShell I will give commentary on what I'm doing with ES. You should be able to learn both ES and some practical, advanced PowerShell.

There's a lot of really horrible PowerShell out there. I'm not part of the VB-style / Cmdlet / horribly-tedious-and-tiring-long-command-name PowerShell weirdness. If you insist on writing out Invoke-WebRequest instead of simply using wget, but use int and long instead of Int32 and Int64, then you have a ridiculous inconsistency to work out. Also, you're part of the reason PowerShell isn't more widespread.. You are making it difficult on everyone . In the following code, we're going to use PowerShell that won't make you hate PowerShell; it will be pleasant and the commands will ends up rolling off your fingers (something that won't happen with the horrible longhand command names).

Flaw Workaround

Before we do anything, we have to talk about one of the greatest epic fails in recent software history...

While the Elasticsearch architecture is truly epic, it has a known design flaw (the absolute worst form of a bug): it allows a POST body in a GET request. This makes development painful:

  • Fiddler throws a huge red box at you.
  • wget in PowerShell gives you an error.
  • Postman in Chrome doesn't even try.
  • HttpClient in .NET throws System.Net.ProtocolViolationException saying "Cannot send a content-body with this verb-type."

.NET is right. Elasticsearch is wrong. Breaking the rules for the sake of what you personally feel makes more sense makes you a vigilante. Forcing a square peg into a round hole just for the sake of making sure gets are done with GET makes you an extremist. Bodies belong in POST and PUT, not GET.

It's a pretty stupid problem having given how clever their overall architecture is. There's a an idiom for this in English: Homer Nodded.

To get around this flaw, instead of actually allowing us to search with POST (like normal people would), we are forced to use a hack: source query string parameter.

PowerShell Setup

When following along, you'll want to use the PowerShell ISE. Just type ISE in PowerShell.

PowerShell note: hit F5 in ISE to run a script

If you are going to run these in in a ps1 file, make sure to run Set-ExecutionPolicy RemoteSigned. as admin. Microsoft doesn't seem to like PowerShell at all. It's not the default in Windows 8/10. It's not the default on Windows Server. You can't run scripts by default. Someone needs to be fired. Run the aforementioned command to allow local scripts.

HTTP/S call

We're now ready to be awesome.

To start, let's create a call to ES. In the following code, I'm calling HTTPS with authorization. I'm not giving you the sissy Hello World, this is from production. While you're playing around, you can remove HTTP and authorization. You figure out how. That's part of learning.

$base = 'search.domain.net' 

$call = {
    param($params)

    $uri = "https://$base"

    $headers = @{ 
        'Authorization' = 'Basic fVmBDcxgYWpndYXJj3RpY3NlkZzY3awcmxhcN2Rj'
    }

    $response = $null
    $response = wget -Uri "$uri/$params" -method Get -Headers $headers -ContentType 'application/json'
    $response.Content
}

PowerShell note: prefix your : with ` or else you'll get a headache

So far, simple.

We can call &$call to call an HTTPS service with authorization.

But, let's break this out a bit...

$call = {
    param($verb, $params)

    $uri = "https://$base"

    $headers = @{ 
        'Authorization' = 'Basic fVmBDcxgYWpndYXJj3RpY3NlkZzY3awcmxhcN2Rj'
    }

    $response = wget -Uri "$uri/$params" -method $verb -Headers $headers -ContentType 'application/json'
    $response.Content
}

$get = {
    param($params)
    &$call "Get" $params
}

$delete = {
    param($params)
    &$call "Delete" $params
}

Better. Now we can call various verb functions.

To add PUT and POST, we need to account for the POST body. I'm also going to add some debug output to make life easier.

$call = {
    param($verb, $params, $body)

    $uri = "https://$base"

    $headers = @{ 
        'Authorization' = 'Basic fVmBDcxgYWpndYXJj3RpY3NlkZzY3awcmxhcN2Rj'
    }

    Write-Host "`nCalling [$uri/$params]" -f Green
    if($body) {
        if($body) {
            Write-Host "BODY`n--------------------------------------------`n$body`n--------------------------------------------`n" -f Green
        }
    }

    $response = wget -Uri "$uri/$params" -method $verb -Headers $headers -ContentType 'application/json' -Body $body
    $response.Content
}

$put = {
    param($params,  $body)
    &$call "Put" $params $body
}

$post = {
    param($params,  $body)
    &$call "Post" $params $body
}

In addition to having POST and PUT, we can also see what serialized data we are sending, and where.

ES Catalog Output

Now, let's use $call (or $get, etc) in something with some meaning:

$cat = {
    param($json)

    &$get "_cat/indices?v&pretty"
}

This will get the catalog of indexes.

Elasticsearch note: You can throw pretty anywhere to get formatted JSON.

Running &$cat gives me the following json:

[ {
  "health" : "yellow",
  "status" : "open",
  "index" : "site1!production",
  "pri" : "5",
  "rep" : "1",
  "docs.count" : "170",
  "docs.deleted" : "0",
  "store.size" : "2.4mb",
  "pri.store.size" : "2.4mb"
}, {
  "health" : "yellow",
  "status" : "open",
  "index" : "site2!production",
  "pri" : "5",
  "rep" : "1",
  "docs.count" : "141",
  "docs.deleted" : "0",
  "store.size" : "524.9kb",
  "pri.store.size" : "524.9kb"
} ]

But, we're in PowerShell; we can do better:

ConvertFrom-Json (&$cat) | ft

Output:

 health status index                 pri rep docs.count docs.deleted store.size pri.store.size
------ ------ -----                 --- --- ---------- ------------ ---------- --------------
yellow open   site1!staging         5   1   176        0            2.5mb      2.5mb         
yellow open   site2!staging         5   1   144        0            514.5kb    514.5kb    
yellow open   site1!production      5   1   170        0            2.4mb      2.4mb         
yellow open   site2!production      5   1   141        0            524.9kb    524.9kb     

Example note: !production and !staging have nothing to do with ES. It's something I do in ES, Redis, Mongo, SQL Server, and every other place the data will be stored to separate deployments. Normally I would remove this detail from article samples, but the following examples use this to demonstrate filtering.

PowerShell note: Use F8 to run a selection or single line. It might be worth removing your entire $virtualenv, if you want to play around with this.

Much nicer. Not only that, but we have the actual object we can use to filter on the client side. It's not just text.

(ConvertFrom-Json (&$cat)) `
    | where { $_.index -match '!production' }  `
    | select index, docs.count, store.size |  ft

Output:

index              docs.count store.size
-----              ---------- ----------
site1!production   170        2.4mb     
site2!production   141        532.9kb   

Getting Objects from ES

Let's move forward by adding our search function:

$search = {
    param($index, $json)

    &$get "$index/mydatatype/_search?pretty&source=$json"
}

Calling it...

&$search 'site2!production' '{
    "query": {
        "match_phrase": {
            "content": "struggling serves"
        }
    }
}'

Elasticsearch note match_phrase will match the entire literal phrase "struggling serves"; match would have search for "struggling" or "serves". Results will return with a score, sorted by that score; entries with both words would have a higher score than an entry with only one of them. Also, wildcard will allow stuff like `struggl*.

Meh, I'm not a big fan of this analogy of SELECT * FROM [site2!production]:

&$search 'site2!production' '{
    "query": {
        "match_phrase": {
            "content": "struggling serves"
        }
    },
    "fields": ["selector", "title"]
}'

This will return a bunch of JSON.

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.6106973,
    "hits" : [ {
      "_index" : "site2!staging",
      "_type" : "entry",
      "_id" : "AVGUw_v5EFj4l3MNvkeV",
      "_score" : 0.6106973,
      "fields" : {
        "selector" : [ "sweet/mine" ]
      }
    }, {
      "_index" : "site2!staging",
      "_type" : "entry",
      "_id" : "AVGU3PwoEFj4l3MNvk9D",
      "_score" : 0.4333064,
      "fields" : {
        "selector" : [ "of/ambition" ]
      }
    }, {
      "_index" : "site2!staging",
      "_type" : "entry",
      "_id" : "AVGU3QHeEFj4l3MNvk9G",
      "_score" : 0.4333064,
      "fields" : {
        "selector" : [ "or/if" ]
      }
    } ]
  }
}

We can improve on this.

First, we can convert the input to something nicer:

&$search 'site2!production' (ConvertTo-Json @{
    query = @{
        match_phrase = @{
            content = "struggling serves"
        }
    }
    fields = @('selector', 'title')
})

Here we're just creating a dynamic object and serializing it. No JIL or Newtonsoft converters required.

To make this a a lot cleaner, here's a modified $search:

$search = {
    param($index, $json, $obj)
    if($obj) {
        $json = ConvertTo-Json -Depth 10 $obj
    }

   &$get "$index/mydatatype/_search?pretty&source=$json"
}

You need -Depth <Int32> because the default is 2. Nothing deeper than the default will serialize. It will simply show "System.Collections.Hashtable. In ES, you'll definitely have deep objects.

Now, I can call this with this:

&$search 'site2!production' -obj @{
    query = @{
        match_phrase = @{
            content = "struggling serves"
        }
    }
    fields = @('selector', 'title')
}

This works fine. Not only that, but the following code still work:

&$search 'site2!production' '{
    "query": {
        "match_phrase": {
            "content": "struggling serves"
        }
    }
    "fields" = ["selector", "title"]
}'

Now you don't have to fight with escaping strings; you can also still copy/paste JSON with no problem.

JSON to PowerShell Conversion Notes:

  • : becomes =
  • all ending commas go away
    • newlines denote new properties
  • @ before all new objects (e.g. {})
  • [] becomes @()
    • @() is PowerShell for array
  • " becomes ""
    • PowerShell escaping is double-doublequotes

DO NOT FORGET THE @ BEFORE {. If you do, it will sits there forever as it tries to serialize nothing into nothing. After a few minutes, you'll get hundreds of thousands of JSON entries. Seriously. I tries to serialize every aspect of every .NET property forever. This is why the -Depth defaults to 2.

Next, let's format the output:

(ConvertFrom-Json(&$search 'content!staging' 'entry' -obj @{
    query = @{
        match_phrase = @{
            content = "struggling serves"
        }
    }
    fields = @('selector', 'title')
})).hits.hits.fields | ft

Could probably just wrap this up:

$matchPhrase = {
    param($index, $type, $text, $fieldArray)
    (ConvertFrom-Json(&$search $index $type -obj @{
        query = @{
            match_phrase = @{
                content = $text
            }
        }
        fields = $fieldArray
    })).hits.hits.fields | ft
}

Just for completeness, here's $match. Nothing too shocking.

$match = {
    param($index, $type, $text, $fieldArray)
    (ConvertFrom-Json(&$search $index $type -obj @{
        query = @{
            match = @{
                content = $text
            }
        }
        fields = $fieldArray
    })).hits.hits.fields | ft
}

Finally, we have this:

&$match 'content!staging' 'entry' 'even' @('selector', 'title')

Output:

title                               selector           
-----                               --------           
{There Defeats Cursed Sinews}       {forestalled/knees}
{Foul Up Of World}                  {sweet/mine}       
{And As Rank Down}                  {crown/faults}     
{Inclination Confront Angels Stand} {or/if}            
{Turn For Effects World}            {of/ambition}      
{Repent Justice Is Defeats}         {white/bound}      
{Buys Itself Black I}               {form/prayer}

There we go: phenominal cosmic power in an ity bity living space

Beefier Examples and Objects

Here's an example of a search that's a bit beefier:

&$search  'bible!production' -obj @{
    query = @{
        filtered = @{
            query = @{
                match = @{
                    content = "river Egypt"
                }
            }
            filter = @{
                term = @{
                    "labels.key" = "isaiah"
                }
            }
        }
    }
    fields = @("passage")
    highlight = @{
        pre_tags = @("<span class=""search-result"">")
        post_tags = @("</span>")
        fields = @{
            content = @{
                fragment_size = 150
                number_of_fragments = 3
            }
        }
    }
}

Elasticsearch note: Filters are binary: have it or not? Queries are analog: they have a score. In this example, I'm moving a filter with a query. Here I'm searching the index for content containing "river" or "Egypt" where labels: { "key": 'isaiah' }

Using this I'm able to to filter my documents by label where my labels are complex objects like this:

  "labels": [
    {
      "key": "isaiah",
      "implicit": false,
      "metadata": []
    }
  ]

I'm able to search by labels.key to do a hierarchical filter. This isn't an ES tutorial; rather, this is to explain why "labels.key" was in quotes in my PowerShell, but nothing else is.

Design note: The objects you sent to ES should be something optimized for ES. This nested type example is somewhat contrived to demonstrate nesting. You can definitely just throw your data and ES and it will figure out the schema on the fly, but that just means you're lazy. You're probably the type of person who used the forbidden [KnownType] attribute in WCF because you were too lazy to write DTOs. Horrible. Go away.

This beefier example also shows me using ES highlighting. In short, it allows me to tell ES that I want a content summary of a certain size with some keywords wrapped in some specified HTML tags.

This content will show in addition to the requested fields.

The main reason I mention highlighting here is this:

When you serialize the object, it will look weird:

"pre_tags":  [
   "\u003cspan class=\"search-result\"\u003e"
]

Chill. It's fine. I freaked out at first too. Turns out ES can handle unicode just fine.

Let's run with this highlighting idea a bit by simplifying it, parameterizing it, and deserializing the result (like we've done already):

$result = ConvertFrom-Json(&$search  $index -obj @{
    query = @{
        filtered = @{
            query = @{
                match = @{
                    content = $word
                }
            }
        }
    }
    fields = @("selector", "title")
    highlight = @{
        pre_tags = @("<span class=""search-result"">")
        post_tags = @("</span>")
        fields = @{
            content = @{
                fragment_size = 150
                number_of_fragments = 3
            }
        }
    }
})

Nothing new so far. Same thing, just assigning it to a variable...

The JSON is passed back was this...

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 30,
    "max_score" : 0.093797974,
    "hits" : [ {
      "_index" : "content!staging",
      "_type" : "entry",
      "_id" : "AVGhy1octuYZuX6XP7zu",
      "_score" : 0.093797974,
      "fields" : {
        "title" : [ "Pause First Bosom The Oft" ],
        "selector" : [ "hath come/hand of" ]
      },
      "highlight" : {
        "content" : [ " as to fault <span class=\"search-result\">smells</span> ourselves am free wash not tho
se lies enough business eldest sharp first what corrupted blood which state knees wash our cursed oft", " give
 above shall curse be help faults offences this snow me pray shuffling confront ere forgive newborn a engaged 
<span class=\"search-result\">smells</span> rain state angels up form", " ambition eldest i guilt <span class=
\"search-result\">smells</span> forehead true snow thicker rain compelld intent bound my which currents our so
ul of limed angels white snow this buys" ]
      }
    },
    ...    

The data we want is is hits.hits.fields and hits.hits.highlights.

So, we can get them, and play with the output...

$hits = $result.hits.hits
$formatted = $hits | select `
        @{ Name='selector'; Expression={$_.fields.selector} },
        @{ Name='title'; Expression={$_.fields.title} }, 
        @{ Name='highlight'; Expression={$_.highlight.content} }
$formatted

This is basically the following...

hits.Select(p=>new { selector = p.fields.selector, ...});

Output:

selector                   title                            highlight                                        
--------                   -----                            ---------                                        
hath come/hand of          Pause First Bosom The Oft        { as to fault <span class="search-result">smel...
all fall/well thicker      Enough Nature Heaven Help Smells { buys me sweet queen <span class="search-resu...
with tis/law oft           Me And Even Those Business       { if what form this engaged wretched heavens m...
hand forehead/engaged when Form Confront Prize Oft Defeats  {white begin two-fold faults than that strong ...
newborn will/not or        Were Gilded Help Did Nature      { above we evidence still to me no where law o...
cursed thicker/as free     Cursed Tis Corrupted Guilt Where { justice wicked neglect by <span class="searc...
currents hand/true of      Both Than Serves Were May        { serves engaged down ambition to man it is we...
two-fold can/eldest queen  Then Sweet Intent Help Turn      { heart double than stubborn enough the begin ...
force did/neglect whereto  Compelld There Strings Like Not  { it oft sharp those action enough art rests s...
babe whereto/whereto is    As Currents Prayer That Free     { defeats form stand above up <span class="sea...

This part is important: highlight is an array. Your search terms may show up more than once in a document. That's where the whole number_of_fragments = 3 came in. The highlight size is from that fragment_size = 150. So, for each entry we have, we have a selector (basically an ID), a title, and up to three highlights up to 150-characters each.

Let's abstract this stuff before we go to the final data analysis step:

$loadHighlights = {
    param($index, $word)

    $result = ConvertFrom-Json(&$search  $index -obj @{
        query = @{
            filtered = @{
                query = @{
                    match = @{
                        content = $word
                    }
                }
            }
        }
        fields = @("selector", "title")
        highlight = @{
            pre_tags = @("<span class=""search-result"">")
            post_tags = @("</span>")
            fields = @{
                content = @{
                    fragment_size = 150
                    number_of_fragments = 3
                }
            }
        }
    })

    $hits = $result.hits.hits
    $hits | select `
            @{ Name='selector'; Expression={$_.fields.selector} },
            @{ Name='title'; Expression={$_.fields.title} }, 
            @{ Name='highlight'; Expression={$_.highlight.content} }
}

Now, we can run:

&$loadHighlights 'content!staging' 'smells'

Let's use this and analyze the results:

(&$loadHighlights 'content!staging' 'smells').foreach({
    Write-Host ("Selector: {0}" -f $_.selector)
    Write-Host ("Title: {0}`n" -f $_.title)
    $_.highlight | foreach {$i=0} {
        $i++
        Write-Host "$i $_"
    }
    Write-Host "`n`n"
})

Here's a small sampling of the results:

Selector: two-fold can/eldest queen
Title: Then Sweet Intent Help Turn

1  heart double than stubborn enough the begin and confront as <span class="search-result">smells</span> ourse
lves death stronger crown murder steel pray i though stubborn struggling come by
2  of forehead newborn mine above forgive limed offences bosom yet death come angels angels <span class="searc
h-result">smells</span> i sinews past that murder his bosom being look death
3  <span class="search-result">smells</span> where strong ill action mine foul heavens turn so compelld our to
 struggling pause force stubborn look forgive death then death try corrupted



Selector: force did/neglect whereto
Title: Compelld There Strings Like Not

1  it oft sharp those action enough art rests shove stand cannot rain bosom bosom give tis repentance try upon
t possessd my state itself lies <span class="search-result">smells</span> the
2  brothers blood white shove no stubborn than in ere <span class="search-result">smells</span> newborn art re
pentance as like though newborn will form upont pause oft struggling forehead help
3  shuffling serve lies <span class="search-result">smells</span> stand well queen well visage and free his pr
ayer lies that art ere a there law even by business confront offences may retain

We have the selector, the title, and the highlights with <span class="search-result">...</span> showing us where our terms were found.

Setup

Up to this point, I've assumed that you already have an ES setup. Setup is once, but playing around is continuous. So, I got the playing around out of the way first.

Now, I'll go back and talk about ES setup with PowerShell. This should GREATLY improve your ES development. Well, it's what helps me at least...

ES has a schema. It's not fully Platonic like SQL Server, nor is it fully Aristotelian like MongoDB. You can throw all kinds of things at ES and it will figure them out. This is what ES calls dynamic mapping. If you like the idea of digging through incredibly massive data dumps during debugging or passing impossibly huge datasets back and forth, then this might be the way to go (were you "that guy" who threw [KnownType] on your WCF objects? This is you. You have no self-respect.) On the other hand, if you are into light-weight objects, you're probably passing ES a nice tight JSON object anyway. In any case, want schema to be computed as you go? That's dynamic mapping. Want to define your schema and have ES ignore unknown properties? That's, well, disabling dynamic mapping.

Dynamic mapping ends up being similar to the lowest-common denominator ("LCD") schema like in Azure Table Storage: your schema might end up looking like a combination of all fields in all documents.

ES doesn't so much deal with "schema" in the abstract, but with concrete indexes and types.

No, that doesn't mean it's schemaless. That's insane. It means that index and the types are the schema.

In any case, in ES, you create indexes. These are like your tables. Your indexes will have metadata and various properties much like SQL Server metadata and columns. Properties have types just like SQL Server columns. Unlike SQL Server, there's also a concept of a type. Indexes can have multiple types.

Per the Elasticsearch: Definitive Guide, the type is little more than a "_type" property internally, thus types are almost like property keys in Azure Table Storage. This means that when searching, you're searching across all types, unless you specify the type as well. Again, this maps pretty closely to a property key in Azure Table Storage.

Creating an index

Creating an index with a type is a matter of calling POST / with your mapping JSON object. Our $createIndex function will be really simple:

$createIndex = {
    param($index, $json, $obj)
    if($obj) {
        $json = ConvertTo-Json -Depth 10 $obj
    }
    &$post $index $json
}

Thing don't get interesting until we call it:

&$createIndex 'content!staging' -obj @{
    mappings = @{
        entry = @{
            properties = @{
                selector = @{
                    type = "string"
                }
                title = @{
                    type = "string"
                }
                content = @{
                    type = "string"
                }
                created = @{
                    type = "date"
                    format = "YYYY-MM-DD"  
                }
                modified = @{
                    type = "date"                  
                }
            }
        }
    }
}

This creates an index called content!staging with a type called entry with five properties: selector, title, content, created, and modified.

The created property is there to demonstrate that fact that you can throw formats on properties. Normally, dates are UTC, but here I'm specifying that I don't even care about times when it comes to the create date.

With this created, we can see how ES sees our data. We do this by calling GET //_mapping:

$mapping = {
    param($index)
   &$get "$index/_mapping?pretty"
}

Now to call it...

&$mapping 'content!staging'

Adding Data

Now to throw some data at this index...

Once again, the PowerShell function is simple:

$add = {
    param($index, $type, $json, $obj)
    if($obj) {
        $json = ConvertTo-Json -Depth 10 $obj
    }
    &$post "$index/$type" $json
}

To add data, I'm going to use a trick I wrote about elsewhere

if (!([System.Management.Automation.PSTypeName]'_netfxharmonics.hamlet.Generator').Type) {
    Add-Type -Language CSharp -TypeDefinition '
        namespace _Netfxharmonics.Hamlet {
            public static class Generator {
                private static readonly string[] Words = "o my offence is rank it smells to heaven hath the primal eldest curse upont a brothers murder pray can i not though inclination be as sharp will stronger guilt defeats strong intent and like man double business bound stand in pause where shall first begin both neglect what if this cursed hand were thicker than itself with blood there rain enough sweet heavens wash white snow whereto serves mercy but confront visage of whats prayer two-fold force forestalled ere we come fall or pardond being down then ill look up fault past form serve turn forgive me foul that cannot since am still possessd those effects for which did crown mine own ambition queen may one retain corrupted currents world offences gilded shove by justice oft tis seen wicked prize buys out law so above no shuffling action lies his true nature ourselves compelld even teeth forehead our faults give evidence rests try repentance can yet when repent wretched state bosom black death limed soul struggling free art more engaged help angels make assay bow stubborn knees heart strings steel soft sinews newborn babe all well".Split('' '');
                private static readonly int Length = Words.Length;
                private static readonly System.Random Rand = new System.Random();

                public static string Run(int count, bool subsequent = false) {
                    return Words[Rand.Next(1, Length)] + (count == 1 ? "" : " " + Run(count - 1, true));
                }
            }
        }

    '
}

n Let's use $gen with $add to load up some data:

$ti = (Get-Culture).TextInfo
(1..30).foreach({
    &$add -index 'content!staging' -type 'entry' -obj @{
        selector = "{0}/{1}" -f ([_netfxharmonics.hamlet.Generator]::Run(1)), ([_netfxharmonics.hamlet.Generator]::Run(1))
        title = $ti.ToTitleCase([_netfxharmonics.hamlet.Generator]::Run(4))
        content = [_netfxharmonics.hamlet.Generator]::Run(400) + '.'
        created = [DateTime]::Now.ToString("yyyy-MM-dd")
        modified = [DateTime]::Now.ToUniversalTime().ToString("o")
    }
})

This runs fast; crank the sucker up to 3000 with a larger content size if you want. Remove "Write-Host" from $call for more speed.

Your output will look something like this

Calling [https://search.domain.net/content!staging/entry]
BODY
--------------------------------------------
{
    "selector":  "death/defeats",
    "title":  "Sinews Look Lies Rank",
    "created":  "2015-11-07",
    "content":  "Engaged shove evidence soul even stronger bosom bound form soul wicked oft compelld steel which turn prize yet stand prize.",
    "modified":  "2015-11-07T05:45:10.6943622Z"
}
--------------------------------------------

When a run one of the earlier searches...

&$match 'content!staging' 'entry' 'even'

...we get the following results:

selector        
--------        
{death/defeats}

Debugging

If stuff doesn't work, you need to figure out how to find out why; not simply find out why. So, brain-mode activated: wrong results? Are you getting them wrong or are they actually wrong? Can't insert? Did the data make it to the server at all? Did it make it there, but couldn't get inserted? Did it make it there, get inserted, but you were simply told that it didn't insert? Figure it out.

As far as simple helps, I'd recommend doing some type of dump:

$dump = {
    param($index, $type)
    &$get "$index/$type/_search?q=*:*&pretty"
}

This is a raw JSON dump. You might want to copy/paste somewhere for analysis, or play around in PowerShell:

(ConvertFrom-Json(&$dump 'content!staging' 'entry')).hits.hits

I'd recommend just using a text editor to look around instead of random faux PowerShell data-mining. Because, JSON and XML both absolutely perfectly human readable, you'll see what you need quick. Even then, there's no reason no to just type the actual link into your own browser:

http://10.1.60.3:9200/content!staging/entry/_search?q=*:*&pretty

I'd recommend the Pretty Beautiful Javascript extension for Chrome.

You can remove &pretty when using this Chrome extension.

Another thing I'd strongly recommend is having a JSON beautifier toggle for input JSON:

$pretty = $False

So you can do something like this:

$serialize = {
    param($obj)
    if(!$pretty) {
        $pretty = $false
    }
    if($pretty) {
        ConvertTo-Json -Depth 10 $obj;
    }
    else {
        ConvertTo-Json -Compress -Depth 10 $obj
    }
}

Instead of calling ConvertTo-Json in your other places, just call &$serialize.

$search = {
    param($index, $type, $json, $obj)
    if($obj) {
        $json = &$serialize $obj
    }
    &$get "$index/$type/_search?pretty&source=$json"
}

Remember, this is for input, not output. This is for the data going to the server.

You want this option because once you disable this, you can do this:

&$match 'content!staging' 'entry' 'struggling' @('selector', 'title')

To get this...

Calling [http://10.1.60.3:9200/content!staging/entry/_search?pretty&source={"fields":["selector","title"],"query":{"match":{"content":"struggling"}}}]

title                            selector         
-----                            --------         
{Those Knees Look State}         {own/heavens}    

Now you have a URL you can dump into your web browser. Also, you have a link to share with others.

Regarding logs, on Linux systems you can view error messages at the following location:

/var/log/elasticsearch/CLUSTERNAME.log

I like to keep a SSH connection open while watching the log closely:

tail -f /var/log/elasticsearch/david-content.log 

Your cluster name will be configured in the YAML config somewhere around /etc/elasticsearch/elasticsearch.yml.

Updating (and Redis integration)

Updating Elasticsearch objects ("documents") is interesting for two reasons, a good one and a weird one:

Good reason: documents are immutable. Updates involve marking the existing item as deleted and inserting a new document. This is exactly how SQL Server 2014 IMOLTP works. It's one secret of extreme efficiency. It's an excellent practice to follow.

Weird reason: you have to update to know the integer ID to update a document. It's highly efficient, which makes it, at worst, "weird"; not "bad". It once allowed updates based on custom fields, you'd have a potential perf hit. Key lookups are the fastest.

Prior to Elasticsearch 2.x, you could add something like { "_id": { "path": "selector" } } to tell ES that you want to use your "selector" field as your ID. This was deprecared in version 1.5 and removed in 2.x (yes, they are two separate things). Today, _id is immutable. So, when you see docs saying you can do this, check the version. It will probably be something like version 1.4. Compare the docs for _id in version 1.4 with version 2.1 to see what I mean.

When you make a call like the following example, an cryptic ID is generated:

POST http://10.1.60.3:9200/content!staging/entry

But, you can specify the integer:

POST http://10.1.60.3:9200/content!staging/entry/5

This is great, but nobody anywhere cares about integer IDs. These surrogate keys have absolutely no meaning to your document. How in the world could you possibly know how to update something? You have to know the ID.

If you have your own useful identifier, then good for you, just use the following:

POST http://10.1.60.3:9200/content!staging/entry/tacoburger

Yet, you can't use any type of slash. Soooo.... forget it. Since I usually use ES to store content linked from URLs, this is isn't going to fly. Besides, nobody wants to have to keep track of all various encodings you have to do to make your data clean. So, we need to add some normalization to our IDs both to make ES happy and to keep our usage simple.

So, OK, fine... have to save some type of surrogate key to key map somewhere. Where could I possibly save them? Elasticsearh IS. MY. DATABASE. I need something insanely efficient for key / value lookups, but that persists to disk. I need something easy to use on all platforms. It should also be non-experimental. It should be a time-tested system. Oh... right: Redis.

The marketing says that Redis is a "cache". Whatever that means. It's the job of marketing to either lie about products to trick people into buying stuff or to downplay stuff for the sake of a niche market. In reality, Redis is a key/value database. It's highly efficiently and works everywhere. It's perfect. Let's start making the awesome...

I'l all about doing things based on first-principles (people who can't do this laugh at people who can do this and accuse them of "not invented here syndrome"; jealous expresses it in many ways), but I'm here I'm going to use the Stackoverflow.Redis package. It seems to be pretty standard and it works pretty well. I'm running it in a few places. Create some VS2015 (or whatever) project and add the NuGet package. Or, go find it and download it. But... meh... that sounds like work. Use NuGet.. Now we're going to reference that DLL..

$setupTracking = {
    Add-Type -Path 'E:\_GIT\awesomeness\packages\StackExchange.Redis.1.0.488\lib\net45\StackExchange.Redis.dll'
    $cs = '10.1.60.2'
    $config = [StackExchange.Redis.ConfigurationOptions]::Parse($cs)
    $connection = [StackExchange.Redis.ConnectionMultiplexer]::Connect($config)
    $connection.GetDatabase()
}

Here I'm adding the assembly, creating my connection string, creating a connection, then getting the database.

Let's call this and set some type of relative global:

$redis = &$setupTracking

We need to go over a few things in Redis first:

Redis communicates over TCP. You sends commands to it and you get stuff back. The commands are assembler-looking codes like:

  • HGET
  • FLUSHALL
  • KEYS
  • GET
  • SET
  • INCR

When you use INCR, you are incrementing a counter. So...

INCR taco

That sets taco to 1.

INCR taco

Now it's 2.

We can get the value...

GET taco

The return value will be 2.

By the way, this is how you setup realtime counters on your website. You don't have to choose between database locking and eventual consistency. Use Redis.

Then there's the idea of a hash. You know, a dictionary-looking thingy.

So,

HSET chicken elephant "dude"

This sets elephant on the chicken hash to "dude".

HGET chicken elephant

This gets "dude". Shocking... I know.

HGETALL chicken

This dumps the entire chicken hash.

Weird names demonstrate that the name has nothing to do with the system and it forces you to think, thus remembering it better long-term.

To get all the values, do something like this:

KEYS *

When I say "all", I mean "all". Both the values that INCR and the values from HSET will show. This is a typical wildcard. You can do stuff like KEYS *name* just fine.

Naming note: Do whatever you want, but it's commmon to use names like "MASTERSCOPE:SCOPE#VARIABLE". My system already has a well defined internal naming system of Area!Environment, so in what follows we'll use "content!staging#counter" and "content!staging#Hlookup"

OK, that's enough to get started. Here's the plan: Because the integer IDs mean absolutely nothing to me, I'm going to treat them as an implemenation detail; more technically, as a surrogate key. My key is selector. I want to update via selector not some internal ID that means nothing to me.

To do this, I'll basically just emulate what Elasticsearch 1.4 did: specify what property I want as my key.

To this end, I need to add a new $lookupId function, plus update both $add and $delete:

$lookupId = {
    param($index, $selector)

    if($redis) {
        $id = [int]$redis.HashGet("$index#Hlookup", $selector)
    }
    if(!$id) {
        $id = 0
    }
    $id
}

$add = {
    param($index, $type, $json, $obj, $key)
    if($obj) {
        $json = &$serialize $obj
        if($key) {
            $keyValue = $obj[$key]
        }
    }
    
    if($redis -and $keyValue) {
        $id = &$lookupId $index $keyValue
        Write-Host "`$id is $id"
        if($id -eq 0) {
            $id = [int]$redis.StringIncrement("$index#counter")
            if($verbose) {
                Write-Host "Linking $keyValue to $id"
            }
            &$post "$index/$type/$id" $json
            [void]$redis.HashSet("$index#Hlookup", $keyValue, $id)
        }
        else {
            &$put "$index/$type/$id" $json
        }

    }
    else {
        &$post "$index/$type" $json
    }
}

$delete = {
    param($index)
    &$call "Delete" $index

    if($redis) {
        [void]$redis.KeyDelete("$index#counter")
        [void]$redis.KeyDelete("$index#Hlookup")
    }
}

When stuff doens't exist, you get some type of blank entity. I've never seen a null while using the Stackoverflow.Redis package, so that's something to celebrate. The values that Stackoverflow.Redis methods work with are RedisKey and RedisValue. There's not much to learn there though, since there are operators for many different conversions. You can work with strings just fine without needing to know about RedisKey and RedisValue.

So, if I'm sending it a key, key the key value from the object I sent in. If there is a key value and Redis is enabled and active, see if that key value is the ID of an existing item. That's a Redis lookup. Not there? OK, must be new, use Redis to generate a new ID and send that to Elasticsearch (POST $index/$type/$id). The ID was already there? That means the selector was already assigned a unique, sequential ID by Redis, use that for the update.

For now, POST works fine for an Elasticsearch update as well. Regardless, I'd recommend using PUT for update even though POST works. You never know when they'll enforce it.

Let's run a quick test:

$selectorArray = &$generate 'content!staging' 'entry' 2 -key 'selector'

($selectorArray).foreach({
    $selector = $_
    Write-Host ("ID for $selector is {0}" -f (&$lookupId 'content!staging' $selector))
})

Output:

ID for visage/is is 4
ID for if/blood is 5

I'm going to hope over to Chrome to see how my data looks:

http://10.1.60.3:9200/content!staging/_search?q=*:*

It's there...

{
    "_index": "content!staging",
    "_type": "entry",
    "_id": "4",
    "_score": 1,
    "_source": {
        "selector": "visage/is",
        "title": "Bound Fault Pray Or",
        "created": "2015-11-07",
        "content": "very long content omitted",
        "modified": "2015-11-07T22:24:23.0283870Z"
    }
}

Cool, ID is 4.

What about updating?

Let's try it...

$obj = @{
    selector = "visage/is"
    title = 'new title, same document'
    content = 'smaller content'
    created = [DateTime]::Now.ToString("yyyy-MM-dd")
    modified = [DateTime]::Now.ToUniversalTime().ToString("o")
}
&$add -index 'content!staging' -type 'entry' -obj $obj -key 'selector' > $null

Output:

{
    "_index": "content!staging",
    "_type": "entry",
    "_id": "4",
    "_score": 1,
    "_source": {
        "selector": "visage/is",
        "title": "new title, same document",
        "created": "2015-11-07",
        "content": "smaller content",
        "modified": "2015-11-07T23:11:58.4607963Z"
    }
}

Sweet. Now I can update via my own key (selector) and not have to ever touch Elasticsearch surrogate keys (_id).

Powered by
Python / Django / Redis / Elasticsearch / Azure Blob Storage / Nginx / Linux

Mini-icons are part of the Silk Icons set of icons at famfamfam.com