--- name: kdb+ contributors: - ["Matt Doherty", "https://github.com/picodoc"] - ["Jonny Press", "https://github.com/jonnypress"] filename: learnkdb.q --- The q language and its database component kdb+ were developed by Arthur Whitney and released by Kx systems in 2003. q is a descendant of APL and as such is very terse and a little strange looking for anyone from a "C heritage" language background. Its expressiveness and vector oriented nature make it well suited to performing complex calculations on large amounts of data (while also encouraging some amount of [code golf](https://en.wikipedia.org/wiki/Code_golf)). The fundamental structure in the language is not the object but instead the list, and tables are built as collections of lists. This means - unlike most traditional RDBMS systems - tables are column oriented. The language has both an in-memory and on-disk database built in, giving a large amount of flexibility. kdb+ is most widely used in the world of finance to store, analyze, process and retrieve large time-series data sets. The terms *q* and *kdb+* are usually used interchangeably, as the two are not separable so this distinction is not really useful. To learn more about kdb+ you can join the [KX Community forums](https://learninghub.kx.com/forums/) or the [TorQ kdb+](https://groups.google.com/forum/#!forum/kdbtorq) group. ```q / Single line comments start with a forward-slash / These can also be used in-line, so long as at least one whitespace character / separates it from text to the left / A forward-slash on a line by itself starts a multiline comment and a backward-slash on a line by itself terminates it \ / Run this file in an empty directory //////////////////////////////////// // Basic Operators and Datatypes // //////////////////////////////////// / We have integers, which are 8 byte by default 3 / => 3 / And floats, also 8 byte as standard. Trailing f distinguishes from int 3.0 / => 3f / 4 byte numerical types can also be specified with trailing chars 3i / => 3i 3.0e / => 3e / Math is mostly what you would expect 1+1 / => 2 8-1 / => 7 10*2 / => 20 / Except division, which uses percent (%) instead of forward-slash (/) 35%5 / => 7f (the result of division is always a float) / For integer division we have the keyword div 4 div 3 / => 1 / Modulo also uses a keyword, since percent (%) is taken 4 mod 3 / => 1 / And exponentiation... 2 xexp 4 / => 16 / ...and truncating... floor 3.14159 / => 3 / ...getting the absolute value... abs -3.14159 / => 3.14159 / ...and many other things / see http://code.kx.com/q/ref/ for more / q has no operator precedence, everything is evaluated right to left / so results like this might take some getting used to 2*1+1 / => 4 / (no operator precedence tables to remember!) / Precedence can be modified with parentheses (restoring the 'normal' result) (2*1)+1 / => 3 / Assignment uses colon (:) instead of equals (=) / No need to declare variables before assignment a:3 a / => 3 / Variables can also be assigned in-line / this does not affect the value passed on c:3+b:2+a:1 / (data "flows" from right to left) a / => 1 b / => 3 c / => 6 / In-place operations are also as you might expect a+:2 a / => 3 / There are no "true" or "false" keywords in q / boolean values are indicated by the bit value followed by b 1b / => true value 0b / => false value / Equality comparisons use equals (=) (since we don't need it for assignment) 1=1 / => 1b 2=1 / => 0b / Inequality uses <> 1<>1 / => 0b 2<>1 / => 1b / The other comparisons are as you might expect 1<2 / => 1b 1>2 / => 0b 2<=2 / => 1b 2>=2 / => 1b / Comparison is not strict with regard to types... 42=42.0 / => 1b / ...unless we use the match operator (~) / which only returns true if entities are identical 42~42.0 / => 0b / The not operator returns true if the underlying value is zero not 0b / => 1b not 1b / => 0b not 42 / => 0b not 0.0 / => 1b / The max operator (|) reduces to logical "or" for bools 42|2.0 / => 42f 1b|0b / => 1b / The min operator (&) reduces to logical "and" for bools 42&2.0 / => 2f 1b&0b / => 0b / q provides two ways to store character data / Chars in q are stored in a single byte and use double-quotes (") ch:"a" / Strings are simply lists of char (more on lists later) str:"This is a string" / Escape characters work as normal str:"This is a string with \"quotes\"" / Char data can also be stored as symbols using backtick (`) symbol:`sym / Symbols are NOT LISTS, they are an enumeration / the q process stores internally a vector of strings / symbols are enumerated against this vector / this can be more space and speed efficient as these are constant width / The string function converts to strings string `symbol / => "symbol" string 1.2345 / => "1.2345" / q has a time type... t:01:00:00.000 / date type... d:2015.12.25 / and a datetime type (among other time types) dt:2015.12.25D12:00:00.000000000 / These support some arithmetic for easy manipulation dt + t / => 2015.12.25D13:00:00.000000000 t - 00:10:00.000 / => 00:50:00.000 / and can be decomposed using dot notation d.year / => 2015i d.mm / => 12i d.dd / => 25i / see http://code.kx.com/q4m3/2_Basic_Data_Types_Atoms/#25-temporal-data for more / q also has an infinity value so div by zero will not throw an error 1%0 / => 0w -1%0 / => -0w / And null types for representing missing values 0N / => null int 0n / => null float / see http://code.kx.com/q4m3/2_Basic_Data_Types_Atoms/#27-nulls for more / q has standard control structures / if is as you might expect (; separates the condition and instructions) if[1=1;a:"hi"] a / => "hi" / if-else uses $ (and unlike if, returns a value) $[1=0;a:"hi";a:"bye"] / => "bye" a / => "bye" / if-else can be extended to multiple clauses by adding args separated by ; $[1=0;a:"hi";0=1;a:"bye";a:"hello again"] a / => "hello again" //////////////////////////////////// //// Data Structures //// //////////////////////////////////// / q is not an object oriented language / instead complexity is built through ordered lists / and mapping them into higher order structures: dictionaries and tables / Lists (or arrays if you prefer) are simple ordered collections / they are defined using parentheses () and semi-colons (;) (1;2;3) / => 1 2 3 (-10.0;3.14159e;1b;`abc;"c") / => -10f / => 3.14159e / => 1b / => `abc / => "c" (mixed type lists are displayed on multiple lines) ((1;2;3);(4;5;6);(7;8;9)) / => 1 2 3 / => 4 5 6 / => 7 8 9 / Lists of uniform type can also be defined more concisely 1 2 3 / => 1 2 3 `list`of`syms / => `list`of`syms `list`of`syms ~ (`list;`of;`syms) / => 1b / List length count (1;2;3) / => 3 count "I am a string" / => 13 (string are lists of char) / Empty lists are defined with parentheses l:() count l / => 0 / Simple variables and single item lists are not equivalent / parentheses syntax cannot create a single item list (they indicate precedence) (1)~1 / => 1b / single item lists can be created using enlist singleton:enlist 1 / or appending to an empty list singleton:(),1 1~(),1 / => 0b / Speaking of appending, comma (,) is used for this, not plus (+) 1 2 3,4 5 6 / => 1 2 3 4 5 6 "hello ","there" / => "hello there" / Indexing uses square brackets [] l:1 2 3 4 l[0] / => 1 l[1] / => 2 / indexing out of bounds returns a null value rather than an error l[5] / => 0N / and indexed assignment l[0]:5 l / => 5 2 3 4 / Lists can also be used for indexing and indexed assignment l[1 3] / => 2 4 l[1 3]: 1 3 l / => 5 1 3 3 / Lists can be untyped/mixed type l:(1;2;`hi) / but once they are uniformly typed, q will enforce this l[2]:3 l / => 1 2 3 l[2]:`hi / throws a type error / this makes sense in the context of lists as table columns (more later) / For a nested list we can index at depth l:((1;2;3);(4;5;6);(7;8;9)) l[1;1] / => 5 / We can elide the indexes to return entire rows or columns l[;1] / => 2 5 8 l[1;] / => 4 5 6 / All the functions mentioned in the previous section work on lists natively 1+(1;2;3) / => 2 3 4 (single variable and list) (1;2;3) - (3;2;1) / => -2 0 2 (list and list) / And there are many more that are designed specifically for lists avg 1 2 3 / => 2f sum 1 2 3 / => 6 sums 1 2 3 / => 1 3 6 (running sum) last 1 2 3 / => 3 1 rotate 1 2 3 / => 2 3 1 / etc. / Using and combining these functions to manipulate lists is where much of the / power and expressiveness of the language comes from / Take (#), drop (_) and find (?) are also useful working with lists l:1 2 3 4 5 6 7 8 9 l:1+til 9 / til is a useful shortcut for generating ranges / take the first 5 elements 5#l / => 1 2 3 4 5 / drop the first 5 5_l / => 6 7 8 9 / take the last 5 -5#l / => 5 6 7 8 9 / drop the last 5 -5_l / => 1 2 3 4 / find the first occurrence of 4 l?4 / => 3 l[3] / => 4 / Dictionaries in q are a generalization of lists / they map a list to another list (of equal length) / the bang (!) symbol is used for defining a dictionary d:(`a;`b;`c)!(1;2;3) / or more simply with concise list syntax d:`a`b`c!1 2 3 / the keyword key returns the first list key d / => `a`b`c / and value the second value d / => 1 2 3 / Indexing is identical to lists / with the first list as a key instead of the position d[`a] / => 1 d[`b] / => 2 / As is assignment d[`c]:4 d / => a| 1 / => b| 2 / => c| 4 / Arithmetic and comparison work natively, just like lists e:(`a;`b;`c)!(2;3;4) d+e / => a| 3 / => b| 5 / => c| 8 d-2 / => a| -1 / => b| 0 / => c| 2 d > (1;1;1) / => a| 0 / => b| 1 / => c| 1 / And the take, drop and find operators are remarkably similar too `a`b#d / => a| 1 / => b| 2 `a`b _ d / => c| 4 d?2 / => `b / Tables in q are basically a subset of dictionaries / a table is a dictionary where all values must be lists of the same length / as such tables in q are column oriented (unlike most RDBMS) / the flip keyword is used to convert a dictionary to a table / i.e. flip the indices flip `c1`c2`c3!(1 2 3;4 5 6;7 8 9) / => c1 c2 c3 / => -------- / => 1 4 7 / => 2 5 8 / => 3 6 9 / we can also define tables using this syntax t:([]c1:1 2 3;c2:4 5 6;c3:7 8 9) t / => c1 c2 c3 / => -------- / => 1 4 7 / => 2 5 8 / => 3 6 9 / Tables can be indexed and manipulated in a similar way to dicts and lists t[`c1] / => 1 2 3 / table rows are returned as dictionaries t[1] / => c1| 2 / => c2| 5 / => c3| 8 / meta returns table type information meta t / => c | t f a / => --| ----- / => c1| j / => c2| j / => c3| j / now we see why type is enforced in lists (to protect column types) t[1;`c1]:3 t[1;`c1]:3.0 / throws a type error / Most traditional databases have primary key columns / in q we have keyed tables, where one table containing key columns / is mapped to another table using bang (!) k:([]id:1 2 3) k!t / => id| c1 c2 c3 / => --| -------- / => 1 | 1 4 7 / => 2 | 3 5 8 / => 3 | 3 6 9 / We can also use this shortcut for defining keyed tables kt:([id:1 2 3]c1:1 2 3;c2:4 5 6;c3:7 8 9) / Records can then be retrieved based on this key kt[1] / => c1| 1 / => c2| 4 / => c3| 7 kt[`id!1] / => c1| 1 / => c2| 4 / => c3| 7 //////////////////////////////////// //////// Functions //////// //////////////////////////////////// / In q the function is similar to a mathematical map, mapping inputs to outputs / curly braces {} are used for function definition / and square brackets [] for calling functions (just like list indexing) / a very minimal function f:{x+x} f[2] / => 4 / Functions can be anonymous and called at point of definition {x+x}[2] / => 4 / By default the last expression is returned / colon (:) can be used to specify return {x+x}[2] / => 4 {:x+x}[2] / => 4 / semi-colon (;) separates expressions {r:x+x;:r}[2] / => 4 / Function arguments can be specified explicitly (separated by ;) {[arg1;arg2] arg1+arg2}[1;2] / => 3 / or if omitted will default to x, y and z {x+y+z}[1;2;3] / => 6 / Built in functions are no different, and can be called the same way (with []) +[1;2] / => 3 <[1;2] / => 1b / Functions are first class in q, so can be returned, stored in lists etc. {:{x+y}}[] / => {x+y} (1;"hi";{x+y}) / => 1 / => "hi" / => {x+y} / There is no overloading and no keyword arguments for custom q functions / however using a dictionary as a single argument can overcome this / allows for optional arguments or differing functionality d:`arg1`arg2`arg3!(1.0;2;"my function argument") {x[`arg1]+x[`arg2]}[d] / => 3f / Functions in q see the global scope a:1 {:a}[] / => 1 / However local scope obscures this a:1 {a:2;:a}[] / => 2 a / => 1 / Functions cannot see nested scopes (only local and global) {local:1;{:local}[]}[] / throws error as local is not defined in inner function / A function can have one or more of its arguments fixed (projection) f:+[4] f[4] / => 8 f[5] / => 9 f[6] / => 10 //////////////////////////////////// ////////// q-sql ////////// //////////////////////////////////// / q has its own syntax for manipulating tables, similar to standard SQL / This contains the usual suspects of select, insert, update etc. / and some new functionality not typically available / q-sql has two significant differences (other than syntax) to normal SQL: / - q tables have well defined record orders / - tables are stored as a collection of columns / (so vectorized column operations are fast) / a full description of q-sql is a little beyond the scope of this intro / so we will just cover enough of the basics to get you going / First define ourselves a table t:([]name:`Arthur`Thomas`Polly;age:35 32 52;height:180 175 160;sex:`m`m`f) / equivalent of SELECT * FROM t select from t / (must be lower case, and the wildcard is not necessary) / => name age height sex / => --------------------- / => Arthur 35 180 m / => Thomas 32 175 m / => Polly 52 160 f / Select specific columns select name,age from t / => name age / => ---------- / => Arthur 35 / => Thomas 32 / => Polly 52 / And name them (equivalent of using AS in standard SQL) select charactername:name, currentage:age from t / => charactername currentage / => ------------------------ / => Arthur 35 / => Thomas 32 / => Polly 52 / This SQL syntax is integrated with the q language / so q can be used seamlessly in SQL statements select name, feet:floor height*0.032, inches:12*(height*0.032) mod 1 from t / => name feet inches / => ------------------ / => Arthur 5 9.12 / => Thomas 5 7.2 / => Polly 5 1.44 / Including custom functions select name, growth:{[h;a]h%a}[height;age] from t / => name growth / => --------------- / => Arthur 5.142857 / => Thomas 5.46875 / => Polly 3.076923 / The where clause can contain multiple statements separated by commas select from t where age>33,height>175 / => name age height sex / => --------------------- / => Arthur 35 180 m / The where statements are executed sequentially (not the same as logical AND) select from t where age<40,height=min height / => name age height sex / => --------------------- / => Thomas 32 175 m select from t where (age<40)&(height=min height) / => name age height sex / => ------------------- / The by clause falls between select and from / and is equivalent to SQL's GROUP BY select avg height by sex from t / => sex| height / => ---| ------ / => f | 160 / => m | 177.5 / If no aggregation function is specified, last is assumed select by sex from t / => sex| name age height / => ---| ----------------- / => f | Polly 52 160 / => m | Thomas 32 175 / Update has the same basic form as select update sex:`male from t where sex=`m / => name age height sex / => ---------------------- / => Arthur 35 180 male / => Thomas 32 175 male / => Polly 52 160 f / As does delete delete from t where sex=`m / => name age height sex / => -------------------- / => Polly 52 160 f / None of these sql operations are carried out in place t / => name age height sex / => --------------------- / => Arthur 35 180 m / => Thomas 32 175 m / => Polly 52 160 f / Insert however is in place, it takes a table name, and new data `t insert (`John;25;178;`m) / => ,3 t / => name age height sex / => --------------------- / => Arthur 35 180 m / => Thomas 32 175 m / => Polly 52 160 f / => John 25 178 m / Upsert is similar (but doesn't have to be in-place) t upsert (`Chester;58;179;`m) / => name age height sex / => ---------------------- / => Arthur 35 180 m / => Thomas 32 175 m / => Polly 52 160 f / => John 25 178 m / => Chester 58 179 m / it will also upsert dicts or tables t upsert `name`age`height`sex!(`Chester;58;179;`m) t upsert (`Chester;58;179;`m) / => name age height sex / => ---------------------- / => Arthur 35 180 m / => Thomas 32 175 m / => Polly 52 160 f / => John 25 178 m / => Chester 58 179 m / And if our table is keyed kt:`name xkey t / upsert will replace records where required kt upsert ([]name:`Thomas`Chester;age:33 58;height:175 179;sex:`f`m) / => name | age height sex / => -------| -------------- / => Arthur | 35 180 m / => Thomas | 33 175 f / => Polly | 52 160 f / => John | 25 178 m / => Chester| 58 179 m / There is no ORDER BY clause in q-sql, instead use xasc/xdesc `name xasc t / => name age height sex / => --------------------- / => Arthur 35 180 m / => John 25 178 m / => Polly 52 160 f / => Thomas 32 175 m / Most of the standard SQL joins are present in q-sql, plus a few new friends / see http://code.kx.com/q4m3/9_Queries_q-sql/#99-joins / the two most important (commonly used) are lj and aj / lj is basically the same as SQL LEFT JOIN / where the join is carried out on the key columns of the left table le:([sex:`m`f]lifeexpectancy:78 85) t lj le / => name age height sex lifeexpectancy / => ------------------------------------ / => Arthur 35 180 m 78 / => Thomas 32 175 m 78 / => Polly 52 160 f 85 / => John 25 178 m 78 / aj is an asof join. This is not a standard SQL join, and can be very powerful / The canonical example of this is joining financial trades and quotes tables trades:([]time:10:01:01 10:01:03 10:01:04;sym:`msft`ibm`ge;qty:100 200 150) quotes:([]time:10:01:00 10:01:01 10:01:01 10:01:03; sym:`ibm`msft`msft`ibm; px:100 99 101 98) aj[`time`sym;trades;quotes] / => time sym qty px / => --------------------- / => 10:01:01 msft 100 101 / => 10:01:03 ibm 200 98 / => 10:01:04 ge 150 / for each row in the trade table, the last (prevailing) quote (px) for that sym / is joined on. / see http://code.kx.com/q4m3/9_Queries_q-sql/#998-as-of-joins //////////////////////////////////// ///// Extra/Advanced ////// //////////////////////////////////// ////// Adverbs ////// / You may have noticed the total lack of loops to this point / This is not a mistake! / q is a vector language so explicit loops (for, while etc.) are not encouraged / where possible functionality should be vectorized (i.e. operations on lists) / adverbs supplement this, modifying the behaviour of functions / and providing loop type functionality when required / (in q functions are sometimes referred to as verbs, hence adverbs) / the "each" adverb modifies a function to treat a list as individual variables first each (1 2 3;4 5 6;7 8 9) / => 1 4 7 / each-left (\:) and each-right (/:) modify a two-argument function / to treat one of the arguments and individual variables instead of a list 1 2 3 +\: 11 22 33 / => 12 23 34 / => 13 24 35 / => 14 25 36 1 2 3 +/: 11 22 33 / => 12 13 14 / => 23 24 25 / => 34 35 36 / The true alternatives to loops in q are the adverbs scan (\) and over (/) / their behaviour differs based on the number of arguments the function they / are modifying receives. Here I'll summarise some of the most useful cases / a single argument function modified by scan given 2 args behaves like "do" {x * 2}\[5;1] / => 1 2 4 8 16 32 (i.e. multiply by 2, 5 times) {x * 2}/[5;1] / => 32 (using over only the final result is shown) / If the first argument is a function, we have the equivalent of "while" {x * 2}\[{x<100};1] / => 1 2 4 8 16 32 64 128 (iterates until returns 0b) {x * 2}/[{x<100};1] / => 128 (again returns only the final result) / If the function takes two arguments, and we pass a list, we have "for" / where the result of the previous execution is passed back into the next loop / along with the next member of the list {x + y}\[1 2 3 4 5] / => 1 3 6 10 15 (i.e. the running sum) {x + y}/[1 2 3 4 5] / => 15 (only the final result) / There are other iterators and uses, this is only intended as quick overview / http://code.kx.com/q4m3/6_Functions/#67-iterators ////// Scripts ////// / q scripts can be loaded from a q session using the "\l" command / for example "\l learnkdb.q" will load this script / or from the command prompt passing the script as an argument / for example "q learnkdb.q" ////// On-disk data ////// / Tables can be persisted to disk in several formats / the two most fundamental are serialized and splayed t:([]a:1 2 3;b:1 2 3f) `:serialized set t / saves the table as a single serialized file `:splayed/ set t / saves the table splayed into a directory / the dir structure will now look something like: / db/ / ├── serialized / └── splayed / ├── a / └── b / Loading this directory (as if it was as script, see above) / loads these tables into the q session \l . / the serialized table will be loaded into memory / however the splayed table will only be mapped, not loaded / both tables can be queried using q-sql select from serialized / => a b / => --- / => 1 1 / => 2 2 / => 3 3 select from splayed / (the columns are read from disk on request) / => a b / => --- / => 1 1 / => 2 2 / => 3 3 / see http://code.kx.com/q4m3/14_Introduction_to_Kdb+/ for more ////// Frameworks ////// / kdb+ is typically used for data capture and analysis. / This involves using an architecture with multiple processes / working together. kdb+ frameworks are available to streamline the setup / and configuration of this architecture and add additional functionality / such as disaster recovery, logging, access, load balancing etc. / https://github.com/DataIntellectTech/TorQ ``` ## Want to know more? * [*q for mortals* q language tutorial](http://code.kx.com/q4m3/) * [*Introduction to Kdb+* on disk data tutorial](http://code.kx.com/q4m3/14_Introduction_to_Kdb+/) * [q language reference](https://code.kx.com/q/ref/) * [TorQ production framework](https://github.com/DataIntellectTech/TorQ)