Current Contracts

My current contract is interesting work, and only spoiled by the accounting trend of pay everybody as late as possible. It looks as though I’ll up my rates to account and manage invoice factorization. It does no good to act as a credit source to a company which has limited status, and yet either has the resources to pay in time, or is miserly in the period before receivership. Either way such companies are not my friend, and no good to be associated with. Credit control …

Well a bit delayed, but payments are arriving, and I’m investing in some electronic product design.

The 3D Flavour Tensor in Analogue to the 4D of Einstein, for a 3D, 4D Curvature in Particle Physics

I like to keep updated about particle physics and LHC things, to quite an advanced level. My interest is in fields and their previous engineering value in radio waves and electronics in general. It makes sense to move to a tensor algebra in the 2+1 charge space, just as was done for the theory of gravitation. In some sense the conservation of acceleration becomes a conservation of net mapped curvature and it becomes funny via Noether’s Theorem.

CP violation as a horizon delta of radius of curvature from the “t” distance is perhaps relevant phrased as a moment of inertia in the 2+1, and its resultant geometric singular forms. This does create the idea of singular forms in the 2+1 space orbiting (or perhaps more correctly resonating) in tune with singularities in the 3+1 space. This interconnection entanglement, or something similar is perhaps connected to the “weak phase”.

So a 7D total space-time, with differing invariants in the 3D and 4D parts. The interesting thing from my prospective is the prediction of a heavy graviton, and conservation of acceleration. The idea that space itself holds its own shape without graviton interaction, and so conserves acceleration, while the heavy graviton can be a short range force which changes the curvature. The graviton then becomes a mediator of jerk and not acceleration. The graviton, being heavy would also travel slower than light. Gravity waves would then not necessarily need graviton exchange.

Quantization of theories has I think in many ways gone too far. I think the big breaks of the 21st century will be turning quantized bulk statistics into unquantized statistics, with quantization applied to only some aspects of theories. The implication is that dark matter is bent spacetime, without matter being present to emit gravitons. In this sense I predict it is not particulate.

So 7D and a differential phase space coordinate for each D (except time) gives a 13D reality. The following is an interesting equation I arrived at at one point for velocity solutions to uncertainty. I did not incorporate electromagnetism, but it’s interesting in the number of solutions, or superposition of velocity states as it were. The w being constant in the assumption, but purtubative expansion in it may be interesting. The units of the equation are conveniently force. A particle observing another particle would also be moving such, and the non linear summation for the lab rest frame of explanation might be quite interesting.

(v^2) v ‘ ‘ ‘−9v v ‘ v ‘ ‘+12(v ‘ ^3)+(1−v^2/c^2)v ‘ (wv)^2=0

With ‘ representing differential w.r.t. time notation. So v’ is acceleration and v” is the jerk. I think v”’ is called the jounce for those with a mind to learn all the Js. An interesting equation considering the whole concept of uncertain geometry started from an observation that relative mass was kind of an invariant, mass oscillation, although weird with RMS mass and RMS energy conservation, was perhaps a good way of parameterizing an uncertainty “force” proportional to the kinetic energy momentum product. As an addition it was more commutative as a tensor algebra. Some other work I calculated suggests dark energy is conservation of mass times log of normalized velocity, and dark matter could be conserved acceleration with gravity and the graviton operating to not bend space on density, but bend space through a short distance acting heavy graviton. Changes in gravity could thus travel slower than light, and an integral with a partial fourth power fraction could expand into conserved acceleration, energy, momentum and mass information velocity (dark energy) with perhaps another form of Higgs, and an uncertainty boson (spin 1) as well.

So really a 13D geometry. Each velocity state in the above mass independent free space equation above is an indication of a particle of differing mass. A particle count based on solutions. 6 quarks and all. An actual explanation for the three flavours of matter? So assuming an approximate linear superposable solution with 3 constants of integration, this gives 6 parameterized solutions from the first term via 3 constants and the square being rooted, The second tern involves just 2 of the constants for 2 possible offsets, and the third term involves just one of the constants, but 3 roots with two being in complex conjugation. The final term involves just one of the constants, but an approximation to the fourth power for 4 roots, and disappearing when the velocity is the speed of light, and so is likely a rest mass term.

So that would likely be a fermion list. A boson list would be in the boundaries at the discontinuities between those solutions, with the effective mass of the boson controlled by the expected life time between the states, and the state energy mismatch. Also of importance is how the equation translates to 4D, 3D spacetime, and the normalized rotational invariants of EM and other things. Angular momentum is conserved and constant (dimensionless in uncertain geometry),

Assuming the first 3 terms are very small compared to the last term, and v is not the speed of light. There would have to be some imaginary component to velocity, and this imaginary would be one of the degrees of freedom (leading to a total of 26). Is this imaginary velocity consistent with isospin?

Yang–Mills Existence and Mass Gap (Clay Problem)

If mass oscillation is proved to exist, then the mass gap can never be proved to be greater than zero as the mass must pass through zero for oscillation. This does exclude the possibility of complex mass oscillation, but this is just mass shrinkage (no eventual gap in the infinite time limit), or mass growth, and hence no minimum except in the big bang.

The 24 degrees of freedom on the relativistic compacted holographic 3D for the 26D string model, imply with elliptic functions, a 44 fold way. This is a decomposition into 26 sporadic elliptic patterns, and 18 generational spectra patterns. With the differential equation above providing 6*2*(2+1) combinations from the first three terms, and the 3 constants of integration locating in “colour space” through a different orthogonal basis. Would provide 24 apparent solution types, with 12 of them having a complex conjugation relation as a pair for 36. If this is the isospin solution, then the 12 fermionic solutions have all been found. That leaves the 12 bosonic solutions (the ones without a conjugate in the 3rd term generative), with only 5 (or if a photon is special 4) having been found so far. If the bosonic sector includes the dual rooting via the second term for spin polarity, then of the six (with the dual degenerates cancelled), two more are left to be found if light is special in the 4th term.

This would also leave 8 of the 44 way in a non existent capacity. I’d maybe focus on them being gluons, and consider the third still to be found as a second form of Higgs. OK.

Displacement Currents in Colour Space

Maybe an interesting wave induction effect is possible. I’m not sure what the transmitter should be made of. The ABC modulation may make it a bit “alternate” near the field emission. So not caused by bosons in the regular sense, more the “transition bosons” between particle states. The specific transitions between energy states may (although it’s not certain), pull the local ABC field in a resonant or engineered direction. The actual ABC solution of this reality has to have some reasoning for being stable for long enough. This does not imply though that no other ABC solutions act in parallel, or are not obtainable via some engineering means.

VS Code and Elm

I’ve been looking into doing JS trans-compiled languages recently. The usual suspects popped up. ClosureScript, Elm, TypeScript, and maybe a few others. This had the unexpected effect of needing the VS Code software as some of the plugins do not yet work with VS Community. I opened up some TypeScript I had wrote recently, and found the way “require” is used for loading is not recognised in VS Code. Strange, and it might cause problems with passing on code to others.

I looked into Elm which is a Haskell for JS. It looks quite good, and I’ve downloaded the kit. I’ll let you know if I start using it big. Closure is Scheme or LISP almost. It seems to have little editor support compared to Elm, and I’d prefer to use Elm over Closure. I already have some libraries downloaded for functional extensions to JS, and some .d.ts descriptors too for some. The main reason I’ve never used Haskell is the large GHC binary size. The idea of using JS as the VM is good. It does however dump about 6000 lines of JS code for a hello world. I haven’t tested if this is per module. I understand Elm can do very fast HTML rendering though, so something to look into.

There’s also Haxe of course, and plenty of plain vanilla JS functional programming modules, including some like RQ, for threading control. Some nice Monad libraries, and good browser support. I also like the TinyMCE. It’s quite a classic. For the toolkit, Bootstrap.js is the current best with all the needed features of a modern looking site.

Beware much ado about category theory, and things like the continuation monad can do all sequential processing … of course from the context of writing it in a sequential language … blah, blah, stored state, pretend there’s none, blah, blah, monad, blah, delay output by wrapper, blah. Ok, well it is true I’m 46, but you young coders out there should take some of the symbology with quite a big pinch of salt, and maybe have a more interesting look at things like the Y combinator. It’s kind of what Mathematica would call Hold[] but with more monad blah for what is really group theory.

TypeScript in Visual Studio Community 2017

Just loaded up the VS 2017 community release. The TS 2.1 features include strong types relating to null and undefined. A bit annoying, but none the less it does force some decent console logging of salient error potential. I’ve found that many apparent error conditions are being removed. If only I could find a way to stop the coercion in JS, such that it is. It is one of the things I never liked about JS. I’d prefer an explicit cast every time.

General Update

An update on the current progress of projects and general things here at KRT. I’ve set about checking out TypeScript for using in projects. It looks good, has some hidden pitfalls on finding .m.ts files for underscore for example, but in general looks good. I’m running it over some JS to get more of a feel. The audio VST project is moving slowly, at oscillators at the moment, with filters being done. I am looking into cache coherence algorithms and strategies to ease hardware design at the moment too. The 68k2 document mentioned in previous post is expanding with some of these ideas in having a “stall on value match” register, with a “touch since changed” bit in each cache line.

All good.

The Processor Design Document in Progress


Well I eventually managed to get a file using _.reduce() to compile without errors now. I’ll test it as soon as I’ve adapted in QUnit 2.0.1 so I can write my tests to the build as a pop up window, an perhaps back load a file to then be able to save the file from within the editor, and hence to become parser frame.


An excerpt from the 68k2 document as it’s progressing. An idea on UTF8 easy indexing and expansion.

“Reducing the size of this indexing array can recursively use the same technique, as long as movement between length encodings is not traversed for long sequences. This would require adding in a 2 length (11 bit form) and a 3 length (16 bit form) of common punctuation and spacing. Surrogate pair just postpones the issue and moves cache occupation to 25%, and not quite that for speed efficiency. This is why the simplified Chinese is common circa 2017, and surrogate processing has been abandoned in the Unicode specification, and replaced by characters in the surrogate representation space. Hand drawing the surrogates was likely the issue, and character parts (as individual parts) with double strike was considered a better rendering option.

UTF8 therefore has a possible 17 bit rendering for due to the extra bit freed by not needing a UTF32 representation. Should this be glyph space, or skip code index space, or a mix? 16 bit purity says skip code space. With common length (2 bit) and count (14 bit), allowing skips of between 16 kB and 48 kB through a document. The 4th combination of length? Perhaps the representation of the common punctuation without character length alterations. For 512 specials in the 2 length form and 65536 specials in the 3 length forms. In UTF16 there would be issues of decode, and uniqueness. This perhaps is best tackled by some render form meta characters in the original Unicode space. There is no way around it, and with skips maybe UTF8 would be faster.”

// tool.js 1.1.1
// (c) 2016-2017 Simon Jackson, K Ring Technologies Ltd
// MIT, like as he said. And underscored 😀

import * as _ from 'underscore';

// LZW-compress a string
// The bounce parameter if true adds extra entries for faster dictionary growth.
// Usually LZW dictionary grows sub linear on input chars, and it is of note
// that after a BWT, the phrase contains a good MTF estimate and so maybe fine
// to append each of its chars to many dictionary entries. In this way the
// growth of entries becomes "almost" linear. The dictionary memory foot print
// becomes quadratic. Short to medium inputs become even smaller. Long input
// lengths may become slightly larger on not using dictionary entries integrated
// over input length, but will most likely be slightly smaller.

// DO NOT USE bounce (=false) IF NO BWT BEFORE.
// Under these conditions many unused dictionary entries will be wasted on long
// highly redundant inputs. It is a feature for pre BWT packed PONs.
function encodeLZW(data: string, bounce: boolean): string {
var dict = {};
data = encodeSUTF(data);
var out = [];
var currChar;
var phrase = data[0];
var codeL = 0;
var code = 256;
for (var i=1; i<data.length; i++) {
if (dict['_' + phrase + currChar] != null) {
phrase += currChar;
else {
out.push(codeL = phrase.length > 1 ? dict['_'+phrase] : phrase.charCodeAt(0));
if(code < 65536) {//limit
dict['_' + phrase + currChar] = code;
if(bounce && codeL != code - 2) {//code -- and one before would be last symbol out
_.each(phrase.split(''), function (chr) {
if(code < 65536) {
while(dict['_' + phrase + chr]) phrase += chr;
dict['_' + phrase + chr] = code;
out.push(phrase.length > 1 ? dict['_'+phrase] : phrase.charCodeAt(0));
for (var i=0; i<out.length; i++) {
out[i] = String.fromCharCode(out[i]);
return out.join();

function encodeSUTF(s: string): string {
s = encodeUTF(s);
var out = [];
var msb: number = 0;
var two: boolean = false;
var first: boolean = true;
_.each(s, function(val) {
var k = val.charCodeAt(0);
if(k > 127) {
if (first == true) {
first = false;
two = (k & 32) == 0;
if (k == msb) return;
msb = k;
} else {
if (two == true) two = false;
else first = true;
return out.join();

function encodeBounce(s: string): string {
return encodeLZW(s, true);

// Decompress an LZW-encoded string
function decodeLZW(s: string, bounce: boolean): string {
var dict = {};
var dictI = {};
var data = (s + '').split('');
var currChar = data[0];
var oldPhrase = currChar;
var out = [currChar];
var code = 256;
var phrase;
for (var i=1; i<data.length; i++) {
var currCode = data[i].charCodeAt(0);
if (currCode < 256) {
phrase = data[i];
else {
phrase = dict['_'+currCode] ? dict['_'+currCode] : (oldPhrase + currChar);
currChar = phrase.charAt(0);
if(code < 65536) {
dict['_'+code] = oldPhrase + currChar;
dictI['_' + oldPhrase + currChar] = code;
if(bounce && !dict['_'+currCode]) {//the special lag
_.each(oldPhrase.split(''), function (chr) {
if(code < 65536) {
while(dictI['_' + oldPhrase + chr]) oldPhrase += chr;
dict['_' + code] = oldPhrase + chr;
dictI['_' + oldPhrase + chr] = code;
oldPhrase = phrase;
return decodeSUTF(out.join(''));

function decodeSUTF(s: string): string {
var out = [];
var msb: number = 0;
var make: number = 0;
var from: number = 0;
_.each(s, function(val, idx) {
var k = val.charCodeAt(0);
if (k > 127) {
if (idx < from + make) return;
if ((k & 128) != 0) {
msb = k;
make = (k & 64) == 0 ? 2 : 3;
from = idx + 1;
} else {
from = idx;
for (var i = from; i < from + make; i++) {
} else {
return decodeUTF(out.join());

function decodeBounce(s: string): string {
return decodeLZW(s, true);

// UTF mangling with ArrayBuffer mappings
declare function escape(s: string): string;
declare function unescape(s: string): string;

function encodeUTF(s: string): string {
return unescape(encodeURIComponent(s));

function decodeUTF(s: string): string {
return decodeURIComponent(escape(s));

function toBuffer(str: string): ArrayBuffer {
var arr = encodeSUTF(str);
var buf = new ArrayBuffer(arr.length);
var bufView = new Uint8Array(buf);
for (var i = 0, arrLen = arr.length; i < arrLen; i++) {
bufView[i] = arr[i].charCodeAt(0);
return buf;

function fromBuffer(buf: ArrayBuffer): string {
var out: string = '';
var bufView = new Uint8Array(buf);
for (var i = 0, arrLen = bufView.length; i < arrLen; i++) {
out += String.fromCharCode(bufView[i]);
return decodeSUTF(out);

//A Burrows Wheeler Transform of strings
function encodeBWT(data: string): any {
var size = data.length;
var buff = data + data;
var idx = _.range(size).sort(function(x, y){
for (var i = 0; i < size; i++) {
var r = buff[x + i].charCodeAt(0) - buff[y + i].charCodeAt(0);
if (r !== 0) return r;
return 0;

var top: number;
var work = _.reduce(_.range(size), function(memo, k: number) {
var p = idx[k];
if (p === 0) top = k;
memo.push(buff[p + size - 1]);
return memo;
}, []).join('');

return { top: top, data: work };

function decodeBWT(top: number, data: string): string { //JSON

var size = data.length;
var idx = _.range(size).sort(function(x, y){
var c = data[x].charCodeAt(0) - data[y].charCodeAt(0);
if (c === 0) return x - y;
return c;

var p = idx[top];
return _.reduce(_.range(size), function(memo){
p = idx[p];
return memo;
}, []).join('');

// Two functions to do a dictionary effectiveness
// split of what to compress. This has the effect
// of applying an effective dictionary size bigger
// than would otherwise be.
function tally(data: string): number[] {
return _.reduce(data.split(''), function (memo: number[], charAt: string): number[] {
return memo;
}, []);

function splice(data: string): string[] {
var acc = 0;
var counts = tally(data);
return _.reduce(counts, function(memo, count: number, key) {
memo.push(key + data.substring(acc, count + acc));
/* adds a seek char:
This assists in DB seek performance as it's the ordering char for the lzw block */
acc += count;
}, []);

// A packer and unpacker with good efficiency
// These are the ones to call, and the rest sre maybe
// useful, but can be considered as foundations for
// these functions. some block length management is
// built in.
function pack(data: any): any {
var str = JSON.stringify(data);
var chain = {};
if(str.length > 524288) {
chain = pack(str.substring(524288));
str = str.substring(0, 524288);
var bwt = encodeBWT(str);
var mix = splice(;

mix =, encodeBounce);
return {
/* tally: encode_tally(tally), */
mix: mix,
chn: chain

function unpack(got: any): any {
var top: number = || 0;
/* var tally = got.tally; */
var mix: string[] = got.mix || [];

mix =, decodeBounce);
var mixr: string = _.reduce(mix, function(memo: string, lzw: string): string {
/* var key = lzw.charAt(0);//get seek char */
memo += lzw.substring(1, lzw.length);//concat
return memo;
}, '');
var chain = got.chn;
var res = decodeBWT(top, mixr);
if(_.has(chain, 'chn')) {
res += unpack(chain.chn);
return JSON.parse(res);


68k Continued …

A Continuation as it was Getting Long

The main thing in any 64 bit system is multi-processing. Multi-threading has already been covered. The CAS instruction is gone, and cache coherence is a big thing. So a supervisor level mutex? This is an obvious need. The extra long condition code register? How about a set of bits to set, and a stall if not zero? The bits could count down to zero over a number of cycles, leaving an opportunity to spin lock any memory location. Putting it in the status or condition code registers avoids the chip level cache shuffle. A non supervisor version would help user tasks. This avoids the need for atomic operations to a large extent, enough to not need them.

The fact a cache can reset pre-filled with “high memory” garbage, and not need empty bits, saves a little, but does need a little care on the compliance of boot sequence. A write back to the cache causes a cross core invalidate in most cache designs, There is an argument to set some status bit for ease of implementation. Resetting just the cache line would work, but remove a small section of memory from the 64 bit address space. A data invalidation queue would be useful to assist in the latency of reset to some synchronous opportunity, the countdown stall assisting in queue size management. As simultaneous write is a race condition, and a fail, by simultaneous deletion, the chip level mutexes must be used correctly. For the case where a cache load has to be performed, a double mutex count lock might have to be done. This infers that keeping the CPU ID somewhere to speed the second mutex lock might be beneficial.

Check cached, maybe repeat, set global, check cached, check global, maybe repeat, set cached, do is OK. A competing lock would fail maybe on the set cached if a time slice occurred just before it. An interrupt delay circuit would be needed for a number of instructions when the global is checked. The common access to the value either sets a stall timer or an interrupt stall timer, or a common timer register, with both behaviours. A synchronization window. Of course a badly written code piece could just set the cached, and ruin everything. But write range bounding would prevent this.

The next issue would be to sort out duplicating a read copy of a cache line into a local cache. This is so likely to be shared memory with the way software should be working. No process shares a cache line otherwise by sensible design of software. A read should get a clone from memory (to not clog a cache transfer bus if the other cache has not written). A cache should check for another cache written dirty, and send a read copy. A write should cause a delete invalidation on the other caches. If locks are correctly written this will preserve all writes. The cache bus only then has to send dirty copies, and in validations. Packet formats are then just an address, an RW bit, and the data width of a cache line (the last part just on the return bus), and the RW bit on the return bus is not used.

What happens when a second write happens, and a read only copy is in transit? It is invalid on arrival, but not responsible for any write back. This is L2 cache here. The L1 cache can also be data invalidated, but can stand the read delay. Given the write invalidate strategy, the packet in transit can be turned into an invalidate packet. The minor point is the synchronous assignment to the cache of the read copy at the same bus cycle edge as the write. This just needs a little logic to prevent this by “special address” forwarding. A sort of cancel on execute as it were.

It could be argued that sending over a read only copy is a bad idea and wastes by over connecting the caches. But to not send it would result in a L3 fetch of something not yet written to L3 yet, or the other option would be to stall based on address until it exits the cache on the other processor from under use of the associative address. That could take a very long time. The final issue is closing the mutex. The procedure is the same as opening, but using a different value to set cached. The mutex needs to be flushed? Nope, as the check cached will send a read only copy, and the set cached will invalidate the other dirty.

I think that makes for a minimal logic L2 cache. The L3 cache can be shared, and the T and S caches do not need coherence. Any sensible code would not need this. The D cache need invalidation only. The I cache should not need anything. When data is written to memory for later use as instructions, there is perhaps an issue, also with self modifying code, which frankly should be ignored as an issue. The L2 cache should get written with code, and a fetch should get a transferred read only copy. There would be no expectation of another write to the same memory location after scheduling execution.

There should be some cache coherence for DMA. There should be no expectation of write to a DMA block before the DMA output transfer is complete. The DMA therefore needs its own L2 cache “simulation” to receive read only updates, and to invalidate when DMA does an input source read. It is only slow off chip IO which necessitates a flush to L3 and main memory. Such things if handled well can allow the write back queue to only have elements entered onto it when hitting the L2 eviction cache. Considering that there is a block of memory which signals cache empty, it makes sense to just pass this write directly out, and latch for immediate continuation of execution, and stall only if the external bus cycle is not complete on a second write to those addresses. The input read on those addresses have to stall by default if a simplistic ideology is taken.

A more complex method is to indicate a pre-fetch. In a similar way to the 1 item buffer. I hope your IO does not read trigger events (unlikely, but write triggering is not unheard of). A delay 1 item buffer does help with a bit of for knowledge, and the end of bus cycle latching into this delay slot can be used to continue processing and routine setup. Address latches internally help with the clock domain crossing. The only disadvantage to this is that the processor decides the memory mapped device layout. It would be of benefit to shuffle this slow bus over a serial protocol. This makes an external PLA, micro-controller or FPGA suitable for running the slow bus, and keeps PIN count on the main CPU lower, allowing main memory to be placed and routed closer to the core CPU “silicon chip” die.

L4 Cache

The concept of cloud as galactic cache is perhaps a thing that some are new to. There is the sequential stride static column idea, which is good for some processing and in effect gives the I cache the highest performance, and shows the sequential stride the best. For tasks that flood the D cache, which is most when heavy optimization is used, The question becomes “is there a need for associative L4 cache off chip?” for an effective use of some MB of static single cycle RAM? With a bus size of 32 bits data, and a 64 bit addressing system, the tag would exceed the data in consuming the SRAM. If the memory bus is just 32 bit addressing, with some DMA SD card trickery for the high word, this still makes the tag large, but less than 50% of the SRAM usage. Burst mode in this sense is auto increment on the low addresses within the SRAM chip to stop copper trace charge power wastage, and a tag check wait state and DRAM access generator. The fact that DRAM is accessed in blocks makes the tag shrink further.

Yes it’s true, DRAM should have “some” associative SRAM as well as static column banks. But the net effect on performance is minimal. It’s more of a L3 eviction cache extender. Things down to the RAM disk has “read only” or even “no action” system files on it for all the memory, makes file buffers actually be files. Most people would not appreciate this level of detail, but it does however allow for easy contraction and expansion of the statically sized RAM disk. D cache thrashing is the problem to solve. Make it bigger. The SDRAM issue can be solved by putting a fair amount of it there, and placing a cloud in higher (or different address space) memory. The SD card interface is then the location of the network interface. Each part of memory is then divided into 3 parts at this level. A direct part, and associative part assisting another cache level and a tag part. the tag part and the associative assist for an interleave partition. Browser caches are a system level feature, not an issue for application developers. The flush cache is “new disk, new net” and beyond.

How to present a file browser picture of the web? FTP sort of did it for files. And a bookmark and search view seem like a good way of starting out. There maybe should be an unknown folder with some entropicly selected default folders based on wordage, and the web becomes seen in scope. At the level of a site index, the “tool type” should render a page view of the “folder”. The source view may also be relevant for some. People have seen this before, or close to it, and made some interesting research tools. I suppose it does not add up to profit per say, but has much more context in robot living allowance world.

This is the end of part II, and maybe more …


  • Addressing (An, Dn<<{size16*SHS}.{DS<<size}, d8/24) so that the 2 bit DS field indexes one of 4 .W or larger.fields, truncated to .B, .W, .L or .Q with apparently even more options to spare. If SHS == 3 for example the DS > 1 have no extra effect. I’ll think on this (18th Feb 2017).
  • If SHS == 2 then DS == 3 has no effect. If SHS == 1 then DS == 3 does have an effect.
  • This can provide for 3 extra addressing modes not yet developed.

68k2-PC#d12 It’s got better! A fab addressing mode. More then 12 bits of embed-able opcode space remains in a 32 bit wide opcode extension, and almost all the 16 bit opcodes are used (all the co-processor F line slots). With not 3, but 10 extra addressing modes.

Where the 68k ISA went Wrong

An AMIGA valentine’s special.

With much hindsight it is possible to analyse where the micro computer scene and the processor market overlapped to allow Intel to overtake the market share. The selection of the EC range in the Amiga was put forward as cheaper, which although true, was indicative of the lack of micro computer use of floating point at that moment in time. The RISC ARM had some fast integer multiply and bit shuffle performance about the same era, which convince the use of some of the $Fxxx for all kind of MMU and FPU mish-mash, all not for micro users. Here are some classics of the day.

No one really needs floating point – The kids didn’t, it worked in software, and GPU cards do that kind of thing today if you really NEED it.
Memory management? Who needs it? – The kids didn’t. Why would the OS not sandbox? OK, I get it, Microsoft wrote Windows and needed the general protection fault. But was not that Intel? Surely it would be better to just have read and write protection trapping on certain addressing mode sequences only. Perhaps a bit array of jump entry points, and a forced check of address modes upto the “possible invalidation point”, with restrictions on “doing things” being a sand boxed software process. TLB based virtual memory is powerful, but “clouding up” does show that library calls come in on some level, and overlays just needed easier configuration management, and perhaps a pre-fetch gain.
D cache and I cache … lovely
Why do people use switch statements on highly index-able emulation code? – Why? If going the JIT way, then a flat JIT memory recompile, and a index of jump address mappings (differing encode lengths) would look good. But dynamic jumps would need a jump index table anyhow in any rational coding strategy. The best place to put a user mode trap? Any fixed jump can be resolved, and any computed jump can be sparse treed and block loaded on demand.
Given the hindsight, a more modern fast math multi wide vector unit would be a better addition than an FPU, with a fast trap jump flag in both the user and supervisor state, to quickly run some vector emulation code. Yes, a vector enable flag in user, and a legacy flag in supervisor mode. I don’t like the 3 bit co-processor address stuff either. Those bits would be better used for other things. There’s a whole $Axxx block to mess with for MMU style things.

For example if it was float register destination, then the standard instruction format would work, and 8 operation codes would be possible, with minimal impact on effective address decoding. The fact that saving floating point would require the reversal of the understanding of the “destination” and “source” would be minor, and make a spare addressing mode using the “direct D register implied as destination” encoding for float registers inside the coding. Given that the loading of FPU registers would also use the direct data register mode, Much better would be 8 single operand functions directly operating on float registers with an immediate direct target. Int to float would be via memory, and software.

So a simple 8 dyad, 8 monad FPU would seem possible, with all literal addressing modes fetching float literals. This would have been an effective use of the $Fxxx instruction space. Very easy to learn and apply. Better assembler support, and hence use incremental. Ideal for an FPGA solution, for ease of implementation given the “back catalog” of software not present on the majority of systems such as Amiga.

  • FADD, FSUB, FMUL, FDIV, FLOAD, FSAVE, FMAQ (a literal [post instruction field] multiply of the EA double and add to destination), FPLY (a literal [post instruction field] added and then multiply the EA double to destination).
  • FINRT (inverse root), FNEG, FLNP1, FEXM1, FATN, FSIN, FBRABS (branch on negative or NaN and make absolute, offset post instruction). These would be on the immediate mode.
  • The immediate mode could be to recall upto 64 constants on the An mode. There is no size field. Or have the 8 float registers on the An mode be floats, and have int to float and float to int on the Dn selections.

More Ideas for Optimization

Also the BCD instructions should be removed. I think there are more logical first/better choice instructions to replace them given all the RISC knowledge, and the fact that they were avoided quite often. Useful in COBOL, but what would it offer in the line of an actual 64 bit multiply or such? There seems a lot of squashed in opcodes which seem to clog the orthogonal nature. In fact losing EXG as well, could be useful in extending arithmetic operations in general on the multiplication side. It also has the advantage of removing a read-modify-write as an instruction primary. Removing BCD opcodes also reduces the fan in of the ALU.

What also surprises me is the fact that 68k manufacturers never use the $Axxx opcode space. Reserved for who? Given the OS incompatibility problem needing a recompile from source on a different system, obviously the system level is the place this space is reserved for. $Axxx speculative use is for another day. Maybe DCTs?

As to the nature of the general core, it would be efficient maybe have multiple dispatch, but a smaller core would be faster. A simple hybrid would be pipelined dispatch with stage skipping, A bigger single cycle cache structure would be more silicon efficient. To make optimal use of the data cache is more about compilation data layout. The instruction cache can be optimized by heavy code auto factorization via BWT pattern matching of a compiled binary. This does however throw some stress on the data cache as threading subroutine addresses get stacked.

The important point is though that there is no alteration of these addresses, and as such they would be better placed in a threading cache. They should be transient, and so have better write back only on spill characteristics. If the D-cache does not “dirty” on a subroutine return address stack, and all “pops” to the PC come from the T or threading cache, there is more efficiency beyond Harvard architecture. The question of where to put other stacked data items is maybe relevant. And killing the modification of a stacked return address is perhaps a good idea in general code. Adding a structured POP return is not needed. The T cache can have a higher density with just one spill address needing adding to system state.

This although adding something, does leave potentially a lot of unused “holes” in the D cache. If the pushes and pulls using SP are further placed in an S cache, a similar spill address, and code would still work without the holes, and dirty bits as long as functions did not stack relative outside the current function’s stack frame. In fact if stack frame indexing is used, the stack relative will always work correctly as a chained nest of indexing. This could be optimized by also using the S cache for link register relative indexing too. Some compatibility issues may arise with ill designed code. Perhaps use of LNK and PEA and some other modals should use an S cache counter. Or maybe this would break too many “bad old code” things, such that the supervisor “legacy” bit (or user “vector” bit) should use the D cache on the link register indirect modes when set, and the S cache switched off.

The next thing to add in the IDTS cache architecture is some mechanism for indirect threading to remove all the JSR instruction opcodes from the subroutine threaded list. Basically applying 1980’s memory techniques to the small cache sizes for fast cache speed. Entering threaded mode is technically the easy challenge, and exiting it at the point of the leaf subroutine code and re-entry on RTS is the most complex. Assuming NOP is one of the most useless instructions in a system is a bad idea, as it synchronizes bus transactions.

But as it does “nothing”, it is possibly an interesting case. The fact that addresses will be what is located in a threading list, a special address (or special addresses like the trap vector list), can be assumed to not form part of the threading, and so allow exit into regular instruction mode, by increasing the “thread counter” from zero, with each JSR/RTN pair inc/dec above zero. When the RTN takes the counter to zero, threading is re-entered. There should be an exception vector for counter overflow, and setting it to zero enters threading mode. Reset should set it to one. It would have to be set via program to zero via a supervisor trap.

Cache Multi-threading

Further optimization of the D cache for other structures which can be paralleled is maybe worth a little more time. A reasonable eviction buffer will fix some issues. The microcode can be split into parts for each pipeline stage, to help reduce its total size. Some pipeline sections might not even need a dual level microcode, or those that do can be 2 pipe staged. Some emphasis on branch prediction has been very effective in modern processor design, with the simplistic backwards taken a staple for many years. This is getting into speculative execution territory, but at this level can be considered speculative decode, up until the first register assignment, with some simplistic stall and flush on miss prediction.

To keep the interrupt latency low is another consideration. This is greatly simplified in a stall flush design, as instruction restart is assured if register commit has no speculation into register renaming. This keeps the processor complexity down, and silicon area down. Having a second “hyper thread” is the most logical way to expand on the core. This does increase area, but shares a lot of logic when cache misses happen. It does however place its own pressure on the caches, and effectively reduces their size per thread. So turning cache area into register decode area, and consequential access delays setting the core size before going multi-core.

An interesting possibility is flipping the hyper thread registers all at once, and having no extra decode. Then splitting the cache into 2, such that the performance of each half is via less decreasing returns, slightly more effective than keeping them unified and being pressured by each other. This requires a 1 cycle switch wait, but also toggles between waits on both when both cache halves are stalled. There is slightly more complexity in write back, but this not much of a relative problem. Multi-threading the OS queue is maybe also complex. or perhaps just as simple as interleaving the timer interrupt to be serviced by either core alternate style. Mutex locking the message passing is the only place this need be done, as the process model of sequential message queues sorts out most contention issues in an already multi-tasking system which has device and file locking.

Things that Might go at $Axxx

So far memory management, and things like DCT codec assistance. The 68k addressing modes support some very complex modes, and even then there is an extra bit set 0 (bit 3) in the full extension word format, which with a bit 8 set to 1, imply even more complex or different modes can be made. Along with the 5 I/IS reserved indications in a regular format full extension. This along with a mode field of 111 and register 101 to 111, give plenty of addressing mode expansion possibilities. The bit field instructions are kind of unnecessary on a large memory system, as it’s better to split bit fields, or use longer instruction sequences. Maybe some are useful, but with bit plane offsets, and modulo in video architecture, there reason for being is largely removed. It’s massive data structures packed then which might need them, for smaller than byte fields. There is also the size operands in some instructions which only have 3 possibilities in 2 bits. Is the extra bit state used? Some instructions already use the can’t target certain modes for extra functionality, such as ORI, suggesting a size long 10, for some extra state information, and a generic 11 size action, mixed up with some strange “opmode and can’t be immediate” of the general OR.

I’d suggest some opcodes are hidden, such as being able to use the other non targetable modes, in “read” for some operand to target an implicit register. It looks so, and would the ruling out of these be consuming unnecessary silicon area to trigger an illegal instruction trap? The PC relative indirect modes for example, although that would just waste literal displacement immediate to point to another literal, and cycle time too. So some more implicit registers could be handled. So I bet the four unused register values in an addressing mode 111 situation would be best as just 4 extra 32 bit registers. Not usable from everywhere, but generally quicker than memory. Not general purpose, and so some compilers would find them difficult to use, but useful in hand optimized inner loops, or for often used task global variables. There is some ideal to go back to plain 68k, with the few minor fixes (if I remember there was some problem with finding status, and supervisor mode that was fixed in later models), and rationalise the add-ons.

So memory needs read protect, write protect, and non execute protect as all the calculated jumps can be indirected and range checked via a library which is accepted. Write protection is perhaps the hardest, as read protection can be done on a per task basis protected secure segment accessed via a supervisor trap and an exception on reading below a memory bound, and using the task ID in a structure allocated below this bound. Generally write protection is not just own task write protection, but global. It also can be loaded up after the addressed data has been loaded into the cache and perhaps overwritten in the cache. The write back is the resolving time of this issue for speed. A simple “bitmapped pixel per page (or even cache line)” strategy would be fast. This uses the minimum of memory. The lack of TLB delays, and having to deal with page faults would make this good. The number of bits per page can be increased to provide protection rings. Or active write zones.

Effective Data

The idea that to extend the addressing scheme the extra I/IS field values in the regular format full extension maybe could do transforms on the data from or to the effective address. This throws an extra ALU into the bus unit. It could be useful, but like all the complex addressing modes it might not get used as much as the designer would hope. PEA and LEA would not use it, Maybe 4 of the codes use the four extra 32 bit registers that can be made? But this would not meld right. They are used in effect to switch on and off sets of things in the addressing mode. 101 to 111 should do a double memory indirect action with null, word and long outer displacements ([[bd,An]],Xn.SIZE*SCALE,od). The two remaining I/IS codes 100 and 100 with differing IS bits should really be part of a group of 3 reassigning 1:000 to ([[bd,An,Xn.SIZE*SCALE]],od) for some flexibility. A BD size field of 00 should also have some effect on the modes, by writing back into An the value An+bd, where the displacement is a word. This write back is even done when the base register is suppressed.

If full extension word format bit 3 is a logic 1, then bits 2 to 0 are an outer displacement register, and bit 6 selects if it is an address or data register. That finalizes the complexity of the addressing modes, apart from maybe utilizing base register suppress for non PC values, for all you crazy people out there. More likely a strange way of getting 8 extra addressing modes. Maybe it would be better to just redesign the extension word format. Such as making bits 7 to 0 have new meaning in an better format full extension. Making bits 5 to 0 be a nested register extension using the result of this effective address indirected as an extra base displacement, and bits 7 and 6 selecting a BD SIZE field, with 00 indicating another following brief extension word, 01 a word, and 10 a long (and maybe 64 bit at 11). The base displacement can be built up by nested brief extension too. Simple. There’s a bit to select a modification of the lower byte.

Just an order for unwinding the brief extension follows counter after processing the full extension register specs. As this ordering does not require a backward search. So a full address mode nesting (indirect displacements), with some simple immediate displacements. The T cache can be used for stacking the nesting, as no JSR will be done in an operand decode.

But no I won’t be building a new instruction table yet. There are other things to code. This is turning into a fill the whole instruction space challenge. For all numerically increasing opcodes upto and including MOVE, I’d have to remove MOVEP, as slow IO can be easily replaced by rarely used longer instruction sequences. I’d also rationalize all the SIZE field 11 CAS group instructions, and maybe move them. MOVES is also suitable for normalization, and having 3 system registers, one to address, one read/write data, and one read status gives the 4 possible bit codes for SIZE in a better MOVES. RTM and CALLM are fine. CMP2/CHK2 are another SIZE field along with CAS/CAS2. In total there are non target modes (6) times SIZE (4) times 2 instructions for prefixes in this range (so 48). If there are 3 auxiliary registers (111 101 to 111 111, as I miscounted 4 earlier), then 4*2 = 8, so 8 prefix words for immediate targets with SIZE.

The 3 auxiliary registers could be something else. Some kind of indirect auto active addressing mode? The 8 prefixes could be assigned to add 8 full width instruction extensions. Packing can be achieved by having them as sticky prefixes via a system status register, but that would need a system trap to exit, or an exit opcode within the added extension which would be preferred. For large data structures 111 101 addressing mode should do a d32 displacement, but using the PC is the only possibility so not that useful. Allowing PC relative writes, as the default. The CALLM should support a d16 for the possible parameter size. Allow BSET etc. on address registers instead of MOVEP, Also 2*32 bit control registers can be targeted similar to CCR.

There would be some rational to define a SIZE 11 meaning, and just eliminate everything which overrode it. OK, so it’s 64 bit. No CMP2, CHK2, MOVEP, CAS, CAS2, and MOVES is modified, d8 become upto d16, and all immediate mode do SR, CCR and a 32 bit and a 64 bit register. This stops the crazy overloading of bit fields. Excellent. This does affect things like MOVE from CCR in other opcodes, and some fiddling about would have to be done, as the immediate target would easily do the MOVE to.


The SIZE field setting of 11 was not kept free for 64 bit use. Resulting in termination of the ISA expandability. As for addressing modes the terminal issue was not flexibility, the problem was not reassigning the XXX.W mode (difficult to use in most code on a multitasking system), and reassigning it as (d32, Reg). This makes 111 101 be (d32, PC). MOVEQ with bit 8 set, becomes a d24 loader. There are quite a few more. There is a nice solution to the bidirectional CCR and SR issue by using 111 110 and 111 111. This makes for various immediate mode prefixes, for the targeting instructions. In fact the general illegal trap should handle the un-handled ones. Or an emulation 64 bit overlap non supervisor exception would be triggered if the supervisor legacy bit was set. Quite a lot of instructions would have to change but for the A register restrictions on some, and immediate mode target.

That horrid idea to have AND and OR have a memory destination mode and the EOR fart-o-matic restrictions. “Well I couldn’t do long division due to having an OR condition I needed to sort out in the D cache.” Much better to retask the direction bit for DIV at all sizes, and MUL too, yes and all of the unsigned kind, or make the byte and word sizes, do long and quad signed, as this would be more useful. The remainder is not available on the quad size, nor the multiply high word, would make a good compromise.

68k2 is a spreadsheet I put together to analyse the ISA. Putting the old 68k instruction set in a vectored non supervisor soft interrupt clearing the compatibility flag, based on a vector with the following function added (b15.b14.b13.b12.b8.b7.b6.b5.b4.b3)<<2 and an extra indirect jump taken. It makes a 4096 byte table, and perhaps some more soft decode, and a little set of subroutines ended RTS (or a modified enter compatibility version). This does kind of set the coding of RTS in the mess, but this can be vectored. Doing so makes a bootstrap potentially easier.

Making the 64-bit idea apply to all registers, and have a massive GB SD automatic interface memory mapped so the low end is a disk hibernation space of the fast RAM, and memory limit interrupt then a 4GB system becomes natural. Even loading the DMA registers becomes automatable, with the direction of transfer too, even though the transfer size may cause confusion in code. It does mean a RAM disk driver would work, with a slight modification of the sector rw routines, and the initial format process changed some.

Part II

Polyphony and Voicing

After designing the filter sections which are to be used in the audio VST under construction, the next thing to decide is the oscillator voicing. The calculation of many voices does consume a large proportion of the CPU resources in any soft synthesizer design. There are various options such as used in organs like top octave generation and clock division, so that in essence at most 12 voices are played, but using different dynamic spectra.

The octave range at about 11 octaves, or upto the 2048th harmonic, makes this approach have some possible issues with harmonic construction.

LZW (Perhaps with Dictionary Acceleration) Dictionaries in O(m) Memory

Referring to a previous hybrid BWT/LZW compression method I have devised, the dictionary of the LZW can be stored in chain linked fixed size structure arrays one character (the symbol end) back linking to the first character through a chain. This makes efficient symbol indexing based on number, and with the slight addition of two extra pointers, a set of B-trees can be built separated by symbol length to also be loaded in inside parallel arrays for fast incremental finding of the existence of a symbol. A 16 bucket move to front hash table could also be used instead of a B-tree, depending on the trade off between memory of a 2 pointer B-tree, or a 1 pointer MTF collision hash chain.

On the nature of the BWT size, and the efficiency. Using the same LZW dictionary across multiple BWT blocks with the same suffix start character is effective with a minor edge effect, rapidly reducing in percentage as the block size increases. An interleave reordering such that the suffix start character is the primary group by of linearity, assists in the scan for serachability. The fact that a search can be rephrased as a join on various character pairings, the minimal character pair can be scanned up first, and “joined” to the end of the searched for string, and then joined to the beginning in a reverse search, to then pull all the matches sequentially.

Finding the suffixes in the LZW structure is relatively easy to produce symbol codes, to find the associated set of prefixes and infixes is a little more complex. A mostly constant search string can be effectively compiled and searched. A suitable secondary index extension mapping symbol sequences to “atomic” character sequences can be constructed to assist in the transform of characters to symbol dictionary index code tuples. This is a second level table in effect, which can be also compressed for atom specific search optimization without the LZW dictionary loading without find.

The fact the BWT infers an all matches sequential nature, and a second level of BWT with the dictionary index codes as the alphabet could defiantly reduce the needed scan time for finding each LZW symbol index sequence. Perhaps a unified B-tree as well as the length specific B-tree within the LZW dictionary would be useful for greater and less than constraints.

As the index can become a self index, there maybe a need to represent a row number along side the entry. Multi column indexes, or primary index keys would then best be likely represented as pointer tuples, with some minor speed size data duplication in context.

An extends chain pointer and a first of extends is not required, as the next length B-tree will part index all extenders. A root pointer to the extenders and a secondary B-tree on each entry would speed finding all suffix or contained in possibilities. Of course it would be best to place these 3 extra pointers in a parallel structure so not to be data interleaved array of struct, but struct of array, when dynamic compilation of atomics is required.

The find performance will be slower than an uncompressed B-tree, but the compression is useful to save storage space. The fact that the memory is used more effectively when compression is used, can sometimes lead to improved find performance for short matches, with a high volume of matches. An inverted index can use the position index of the LZW symbol containing the preceding to reduce the size of the pointers, and the BWT locality effect can reduce the number of pointers. This is more standard, and combined with the above techniques for sub phases or super phrases should give excellent find performance. For full record recovery, the found LZW symbols only provide decoding in context, and the full BWT block has to be decoded. A special reserved LZW symbol could precede a back pointer to the beginning of the BWT block, and work as a header of the post placed char count table and BWT order count.

So finding a particular LZW symbol in a block, can be iterated over, but the difficulty in speed is when the and condition comes in on the same inverse index. The squared time performance can be reduced? Reducing the number and size of the pointers in some ways help, but it does not reduce the essential scan and match nature of the time squared process. Ordering the matching to the “find” with least number on the count makes the iteration smaller on average, as it will be the least found, and hence least joined. The limiting of the join set to LZW symbols seems like it will bloom many invalid matches to be filtered, and in essence simplistically it does. But the lowering of the domain size allows application of some more techniques.

The first fact is the LZW symbols are in a BWT block subgroup based on the following characters. Not that helpful but does allow a fast filter, and less pointers before a full inverse BWT has to be done. The second fact is that the letter pair frequency effectively replaces the count as the join order priority of the and. It is further based on the BWT block subgroup size and the LZW symbol character counts for calculation of a pre match density of a symbol, this can be effectively estimated via statistics, and does not need a fetch of the actual subgroup size. In collecting multiple “find” items correlations can also be made on the information content of each, and a correlated but rarer “find” may be possible to substitute, or add in. Any common or un correlated “find” items should be ignored. Order by does tend to ruin some optimizations.

A “find” item combination cache should be maintained based on frequency of use and execution time to rebuild result both used in the eviction strategy. This in a real sense is a truncated “and” index. Replacing order by by some other method of such as order float, such that guaranteed order is not preserved, but some semblance of polarity is run. This may also be very useful to reduce sort time, and prevent excessive activity and hence time spent when limit clauses are used. The float itself should perhaps be record linked, with an MTF kind of thing in the inverse index.


VESA NET? An idea for a BIOS extension. A protocol for total removal of the video card from the server. The VESA frame buffer becomes virtual, and routed out only UDP to the default broadcast address. A listener on the network presents MAC address (maybe translated), say 256 screens (8 by 8), so that any maybe routed and zoomed full screen. Along with an SSH ability on the “KVM” box, the default console can be seen of a whole server array. The main purpose being to remove the graphics card from servers. Allowing an SSH via MAC address in the BIOS would seem convenient, but does have massive security issues, being inbound traffic. The printing of a key finger print on the default console assists in the concept of possible login for inbound traffic, and install of the virtual keyboard, and perhaps mouse for more socket removal on the server. Not a full spec, just a concept. The configuration of the VESA receiver to proxy important or even public screens of interest as a web forward, by maybe even including a virtual floppy disk in manor similar to the PROC fs, would also fit in the space available in a modern BIOS flash.

Implementation of Digital Audio filters

An interesting experience. The choice of FIR or IIR is the most primary. As the filtering is modelling classic filters, the shorter coefficient varieties of IIR are the best choice for me. The fact of an infinite impulse response is not of concern with a continuous stream of data, and coefficient rounding is not really an issue when using doubles. IIR also has the advantage of an easy Sallen-Key implementation, due to the subtraction and re-adding of the feedback component, with a very simple CR processing.

The most interesting choices are to do with the anti-alias filtering, as the interpolation filter, on up-sampling is an easy choice. As the ear is not really responsive to phase, all the effort should be on the pass band response levels, and a good stop band non response. A Legendre or Butterworth are the candidates. The concept of a characteristic sound enters the design process at this point, as the cascading of SK filter sections is conceptually useful to improve the -6 dB response at cut off. This is a trade off of 20 kHz to 22.05 kHz in the alias pass band, and greater attenuation in the above 22.05 kHz infinite stop desire. The slight greater desire of alias attenuation above pass band maximal flatness (for audio harmony) implies the Legendre filter is better for the purpose than Butterworth.

In the end, the final choice is one of convenience. and a 9th order filter was decided upon, with 4 times oversampling. The use of 4 times oversampling instead of 8 times oversampling increases the alias by an octave reduction. This fact under the assumption of at least a linear reduction in the amplitude of the frequency of the generator of an alias frequency, with frequency increase, just requires a -12 dB extra gain reduction in the alias filter for an effective equivalence to 8 times oversampling (the up to and the reflection back down to 6 + 6). The amount of GHz processing also halves. These facts then become constructive in the design, with the bulk alias close to the cut off, and the minor reflected alias-alias limit, not being too relevant to overall alias inharmonic distortion.

A triple chain of 3 pole Legendre filter sections is the decided design. The approximate -9 dB at the corner, allows for slightly shifting up the cut off and still maintaining a very effective stop band. Code reuse also aids in the I-cache usage for CPU effective use.  A single 3 pole Legendre is the interpolation up sample filter. The roll off for not using Butterworth does cut some high frequency content from the maximally flat, hence the concept of maximally flat, but it out performs a Bessel filter in this regard. It’s not as though a phasor or flanger needs to operate almost perfectly in the alias band.

Perhaps there is improvement to be made in the up sampling filter, by post up sample 88.2 kHz noise shaped injection to eliminate all error at 44.1 kHz. This may have a potential advantage to map the alias noise into the low frequencies, instead of encroaching from the higher frequencies to the lower, and for creating the alias as a reduction in signal to noise, instead of at certain inharmonic peaks. The main issue with this is the 44.1 kHz wave fundamental, seen as the amplitude ring modulation of the injected phase noise, by the 44.1 kHz stepped waveform between samples input. The 88.2 kHz “carrier” and the sidebands are higher in frequency, and of the same amplitude magnitude.

But as this is following for no 44.1 kHz error, the 88.2 kHz and sidebands are the induced noise, the magnitude of which is of the order of 1 octave up from the -3 dB roll at the corner, plus approximately the octave for a 3 pole filter, or about 36 dB cut of a signal 3/4 of the input amplitude. I’d estimate about -37 dB at 88.1 kHz, and -19 dB at 44.1 kHz. Post processing with a 9 pole filter, provides an extra -54 dB on down sampling, for an estimate of around -73 dB or greater on the noise. That would be about 12 bit resolution at 44.1 kHz increasing with frequency. All estimates, likely errors, but in general not a good idea from first principals. Given that the 44.1 kHz content would be very small though post the interpolation filter, -73 dB down from this would be good, although I don’t think achievable in a sensible manor.

Using the last filtered sample in as the reference for the present sample filtered in as a base line, the signal at 22.05 kHz would be smoothed. It would have a notch filter effect, by injecting quantization offset ringing noise at 88.2 kHz to cancel 22.05 kHz. The notch would likely extend down in frequency for maybe -6 dB at about 11 kHz. Perhaps in the end it is just better to subtract the multiplied difference between two up sample filters using different sinc spreading of a 1000 and a 1100 sample occupancy zero inter fill. Subtracting the alternates up conversion delta as it were.

There is potentially also an argument for having a second order section with damping factor near 0.68 and corner 22.05 kHz to achieve some normalisation from sinc up-sampling. This adds in an amount of Q such as to peak the filter cancelling the sinc droop, which would be about 3% at 4 times oversampling.

EDIT: Some of you may have noticed that the required frequencies for stable filtering are too high at 4 times oversampling. So unfortunate for the CPU load an 8 times oversample has to be used. The sinc error is less than 1% at this oversample, but still corrected in a similar way, and a benefit of 2 extra poles. Following this by a 0.1 dB 3 pole Chebyshev high pass which has been inverted, gives a reasonable 5 pole up sampling filter. The down sampling filter for code efficiency is a triple instance of the sample inverse Chebyshev, with the corner frequencies slightly offset to produce more individual zeros, and some spreading of the “ringing”. These 9 poles are enough to get the stop band ripple to be lower than a 16 bit resolution. Odd order inverse Chebyshev are essential for the reflected spectra to be continually decreasing in amplitude.

How Moving to Outlook Destroys Your Gmail

Just testing out office 365. Surprising features in Outlook include unsubscribing folders, still makes them delete-able. Which is very surprising as you’d likely unsubscribe to a folder to not consume local space and download, and just want it left alone on the server. But alas the default to spam fill everything, with no subscribe wizard, and a bad default action on unsubscribed folders, and you’d think Microsoft never ever tested this thing in the real world of reality.

I have been thinking of a short cost saving by an Azure migration, but this has to be tested before committal. The last thing I need is to find that Microsoft has some limits on root servers. I find my current provider good, and would not want to end up between a rock and a hard place.

JDeveloper and Intel Python

The JDeveloper environment looks good. Nice work Oracle, and some of the Borland classic JBuilder. This tool look more like how I’d use an IDE. I’ve been looking at other technologies for computer development, and a recent Intel offering (for personal use free) is the MKL backed Intel Python. It needs at least an SSE4.2 supporting chip, but does have all that is needed to run the development on Xeon Phi Knights Landing. 72 cores and 144 vector processing AVX-512 engines. Multi Tflops stuff. For the developer this is perhaps the easiest way to start HPC, as through Cython and eventually C, the best performance can be had. Maybe FPGAs will help, and tools are available for that too. I’ve seen some good demonstrations, and maybe some clients with complex or hard problems would need this.

All this parallel stuff got me thinking of Kahan sums, and simulation of incompressibles by having a high speed of sound in a compressible, and the doing a compulsory diffusion to damp oscillation, and a pressure impulse (Pa s) handling of inertial failure of containers. It might reduce the non-locality of certain simulations, and actually act to simulate pressure hammer effects.

I’ve also recently got back into the idea of using Free Pascal for some of my projects maybe. There is now good JNI support, and even JVM targeting. I maen it’s very possible to use C for this kind of thing, but the FPC IDE and Lazarus are quick to build, with incremental unit compilation and many other features which make it good competition for general coding. Some would think it old hat, but the ease of use is excellent with much type checking, and no insistence on everything being a class. Units are very modular that way. The support for quite a few Pascal flavours is also good.

Power Systems

genLot of free energy videos about but does it actually work, or is it just virtual vapour wares? Here’s a highly unstable circuit I designed a few years ago. The magnetic balance is so fine, that an external field can throw the circuit into an unstable power spike. Then I went for an inductance modulation of lower scale, using a 3 phase (+++), to 1 phase (++-) arrangement, for greater stability. The difficulty with such devices is not the working, but the switch off without raising volts potential to any unlucky hands. This safety aspect is the ultimate reason of non use, and not as some suppose the disruptive effect on oil and other nuclear markets. Those markets may shrink, but will always be. The chemical industry will always have need of basic oil produce, and the lower short term profits of non burn, actually extend the future profits of chemical building. Transport is minor compared to health. The nuke industry could easily shrink, and still be big. The power waste of removing rods with 90% still effective power is a white wash of the electric power from a military objective. Reactors would be different for pure civil use.

downloadAmazing colours, but what’s it really about? The Pu problem of fast breed, and somehow there will never be less of it, just does not add up for efficient too cheap to meter power promises of not too long ago. There seems to be no real research on gamma cavity down conversion technology. I wonder how long it will be before the nova bomb. The effective slowing of light to lower than the black hole threshold, at Sun core. I think the major challenge is getting super dielectrics far enough into the Sun without melt. I suppose this is some hyperbole focus problem. One day people will understand the simple application of button technology, and the boxes will judge and provide on intent or not. It’s not like they won’t have a self interest.

Musical Research

I’ve looked into free musical software of late, after helping out setting up a PC for musical use. There’s the usual Audacity, and feature limited, but good LMMS. Then I found SuperCollider, and I am now hooked on building some new synthesis tools. I started with a basic FM synth idea, and moved on to some LFO and sequencing features. It’s very nice. There is some minor annoyance with the SC language, but it works very well, and has access to the nice Qt toolkit for making GUIs. So far I’ve implemented the controls as a MIDI continuous controller bus proxy, so that MIDI in is easy. There will be no MIDI out, as it’s not that kind of be all thing. I’ve settled on a 32 controls per MIDI channel, and one window focus per MIDI channel.

I’ve enjoyed programming it so far. It’s the most musical I’ve been in a long while. I will continue this as a further development focus.

The situation so far

The last three, are just an idea I had to keep the keyboard as a controller, and make all the GUI connection via continuous controllers only. I’ll keep this up to date as it develops.

Site Looks

I thought about making my site look like google today. The green here is distinctive, but maybe the site could learn from some google UI styling. But then maybe the net would all look the same. A bit of an off dream where the moon had all one style, and then each planet had sort of a brand. It’s more the arguments of the extents of confusion that would perhaps result from an eye catching UI, I still have not really formatted my landing page to my satisfaction. It’s an interesting thought that when the web of one planet becomes a wiki zone of another, the web becomes in effect a pivot table of UI elemental descriptions. UI design has been the defining feature of the web. Well yes, apps had it first, but there was a consistency of design in the super highway process, such that too much differentiation was counter productive in education hours.

Some have experimented with auto reflow of site HTML to modularize the CSS content, and so reduce the tweek time. A reflow portal based not just on the browser agent, but other stylistic factors could be a good future development strategy. I wouldn’t say it was a site priority here, but might be a good future project. This might be good for public spaces, where the seen that one, brings about little review.

Tax in a Technological Economy

I decided to produce this free work covering a subject certain to almost all, as the saying goes death and taxes. As director of this company I have to hold an opinion on such things, be sure the government does. I thought it best to open the internal decision stream to customers and subcontractors and the wider public, so that such issues such as am I buying services from the latest tax haven, or is there and extra 20% effective going to the cause of central. So this company will pay all taxes due, and perform no self manufactured tax breaks by diverting funds into holding companies with surprisingly opaque financials. As to the issuance of dividend, it’s only an issue of the differential between personal income tax and corporation tax. Hence if one is higher or lower, the government intent is to suggest that (well depending on it being a break or a punitive), one or another should be done. To comply with this differential all profit be either corporation tax or full issue to dividend the same effective amount, to the effect of fulfilling the intentional supra directive of governance. As a maximal to government policy can have no detriment beyond the sense of the government option to issue rebate, a matter of two sums and a max holds the basis for the return. This will be issued into the record as a short form expression of this intent is formed.

The arguments against this are not relevant for small companies, but are interesting to me. The financial weight of a company in a sense votes on the validity of government policy, and also via the central limit theorem can, but not always does add some stability to the bumps of government policy input at very specific points such as the budget. Weather this share based meta vote is valid in ideal is not the point. It is fact. In a way, I felt the need to inject situation as with a very large company implementation of a maximal extreme policy would lead to potential PLC stock instability, and more critical divergence and resulting accountability of government policy.

In the days where director’s bread runs thin, the prices really could be justified to be higher, but in the situation of capital demand, the lower but liveable introductory offer is the sale, sale, sale of it all. The capital buffer of a large company will always outperform in a sale negative cost battle. The only option in such a situation is quality contract delivery at a reasonable cost. This is innately a consequence of pay by the hour systems. All known good clients know this. Optimization efficiency, and automation are exemplified by the phone in your pocket. Robots, robots, robots. Some estimates place 20% redundancy through automation in lifetime as conservative. The effective replacement of pay by the hour occupational replacement is yet not an automated provider. Robot living allowance is not a joke, Although darkly funny.

As a digital business, with surprisingly analog books, many computer based Americanisms and longer term goals of electrical production, KRT has to be aware of future customers, and not just marketing for now. Holding of risk based on past operational equipment when the economic model is to design and sell services for the future deficit load, and not a present credit bubble, is the game of the future. At present KRT has on occasion used contractors, as employee costs are difficult to justify. Not the work risk, but the contract risk. Continuous jobs, even technology based ones, need a steady supply of work contracts coming in. The larger the contract the more unstable it is to renegotiation, and need of a cash buffer. Contractors are much easier. They are on in effect though zero hour contracts as the lingo goes, although this is more like at least some hours but no repeat business guarantee. This then leads on to the obvious development of customer service, to achieve a conversion and repeat business rate. At KRT, the model is for quality contract service. This does not involve a cold call sales pitch, as I receive enough of those already. They are good for occupying the time of those who do not need the service, with an occasional contract win. Delivering services to people who are in need of the service, without harassment for further business, is part of the service. Good customers will always know where to look again, and are good at recommendation. The most effective strategy then from the KRT point of view is the development of R&D rigs and insight concept projects. Interesting technology becomes worth looking in on.

The updates will continue where relevant.


The CHIP is and excellent little computer. The production is having difficulties at present, but hopefully this will be sorted out soon. You can even get a keyboard and screen case thing, for mobile use. In short it’s a competitor to the Raspberry Pi, and has a number of benefits of not needing a WiFi dingle, and has SD disk built in, so not needing a card. What hasn’t it got by default? HDMI, many USB ports, free Mathematics, easy SD card swapping, and only has one OS choice. What extra does it have? Supports composite video, an easy portable case is available, has a built in battery charge circuit, costs less, has Bluetooth and WiFi builtin, and if bought with the pocket case with screen, it has some free game software thrown in. A more in depth out of the box review to follow.

Why my interest? Thin client possibilities of the future and present. So far so good. Quite intuitive to use, and only 17% (after an apt upgrade) of the disk used with the default install. The keyboard is a little fiddly but everything is there. There should be some good options for tools building on this. WiFi connects smoothly, no problems. The home key and ALT+TAB are effective for window management, and the included home screen defaults, are for good fun. Notable exceptions to the install are no immediate browsing, or JS/HTML client. No RTF or anything more than a simple text editor. But considering the pocketCHIP form factor, this would be an ideal typing on the go platform, where whipping out a laptop would be excessive, and a tablet might do, but a Windows Tablet will consume more power, and an Android Tablet would not quite be as customizable. Not that this all can’t be sorted given enough skill with Linux, and given the device is designed for such hackery. it’s likely a plus.

For cloud deploy, the missing features are some kind of file sync. something that would make corporate grade application prototyping a breeze. I think config files have to become config directories with date ordering priority for applied last relevance. It could make some kind of key code download architecture for situational setup. Of course this would have to be done at user level privileges. Maybe a git branch tag or ID. So I think the first install is git, and then someway to monkey patch the menu system. Looks like gksu or -A on sudo is going to have to be used to add a script, and then delete the script installer, so as to prevent the hijack of the script at a later date. This would make a wget over https pipe into bash as a one line boot strap.

Browsing the Web

I was fiddling about with all those usual suspects, but the team got it right surf is the browser to got for. Just press help, and then CTRL + G for to enter the matrix. The simplistic excellence. It’s even running a reasonable JavaScript. Sorted. Rendering is excellent with good HTML, and complete crap with absolute sized elements. Lucky a little surf config will reflow some bad sites.

SpliceArray in JavaScript

The current focus in the roitEmbed project is the SpliceArray class. The aim being a tree array structure, where each leaf does not have to be full. The cumulative counts of the leaves make for a simple binary search algorithm to find the correct leaf and element. Splicing involves removing and adding in elements. This makes the expensive linear copy of almost all elements on a large array splice into just insertion and deletion of smaller arrays, and so makes the splice not dependant on the number of array elements. The depth of the array tree needed, depends on the maximum size and is a trade off between differing efficiencies. An optimal size for each leaf node is about sixteen elements. This means the branch nodes have thirty two elements, as both a link and a cumulative count is required. A tree depth of six allows for a 16.7 million element array maximum. This is more than sufficient for any task I can imagine being done by browser side JS.

Collections and Operator Overloading NOT in JS

Well, that was almost a disappointment for optimising ordered collection renders by using arrays. But I have an idea, and will keep you informed. You can check the colls.js as it evolves. I wont spoil the excitement. The class in the end should do almost everything an array can do, and more. Some features will be slightly altered as it’s a collection, and not just an an object with an array prototype daddy. The square brackets used to index arrays are out. I’m sure I’ve got a good work around. I’m sure I remember from an old Sun Microsystems book on JavaScript text indexed properties can do indexing in the array, but that was way back when “some random stuff is not an object” was a JS error message for just about everything.

One of these days I might make a mangler to output JS from a nicer, less coerced but more operator coerce-able language syntax. Although I have to say the way ECMAScript 6 is going, it’s a bit nuts. With pointless static and many other “features”. How about preventing JavaScript’s habit of just slinging un-var-ed variables into the global namespace without a corresponding var declaration in that scope? The arguments for and against are easy to throw a value to a test observer to debug, versus harder to find spelling errors in variable names at parse time. If the code was in flight while failing, then knowing the code will indicate where the fault lies. For other people’s code a line number or search string would be better. Either works for me.

My favourite mind mess would be { .[“something”]; anything; } for dynamic tag based on the value of something in object expressions. Giggle.

Spoiler Alert

Yes, I’ve decided to make the base array a set of ordered keys, based on an ordering set of key names, so that many operations can be optimised by binary divide and conquer. For transparent access a Proxy object support in the browser will be required. I may provide put and get methods for older situations, and also because that would allow for a multi keyed index. The primary key based on all supplied initial keys and their compare functions to use and the priority order [a, b, c, …], and automatic secondary keys of [b, c, …], [c, …], […], … for when there needs to be so, I’ll start the rewrite soon. Of course the higher operations like split and splice won’t be available on the auto secondary keys, but in years of database design I’ve never could have not been one of such form.

The filt.js script will then extend utility by allowing any of the auto secondaries to be treated as a primary on the filter view, and specification of an equals, or a min and max range. All will share the common hidden array of objects in a particular collection for space efficiency reasons. This should make a medium fast local database structure possible, with reasonable scaling. Today and tomorrow though will be spent on a meeting, and effective partitioning strategy to avoid a “full table pull requests” to the server.

Further Improvements

The JSON encoded collation order was chosen to prevent bad comparisons between objects with silly string representations. It might be extended, such that a generic text search, and object key ordering are given some possibility. This is perhaps another use the compression can be put as BWT in the __ module has good search characteristics. Something to think over. It looks as though the code would be slow depending on heavy use of splice. This does suggest an optimization by making another Array subclass named SpliceArray which uses an n-tree with sub element and leaf count and cumulative tally, for O(1) splice performance.

WordPress. What’s it like?

I’ve been using WordPress for a while now, and it’s OK. The admin interface can take a little to get used to, especially when plugins throw menus all over the place, but the online help is very good. The main issues recently were with configuring an email sendmail system so that WordPress could send emails. Upgrading can also be a bit of a pain too, The main issue at the moment is building a dynamic web app. The option to edit WordPress PHP is a no, on the grounds of updates of WordPress source overwrite changes. Creation of a WordPress plugin is also an option, but was not chosen as I want client rendering, not server rendering in the app, to keep the server loading low. I see WordPress as a convenience, not a perfection. So the decision was taken to use client side JavaScript rendering, and have one single PHP script (in the WordPress root folder) which supplies JSON from extra tables created in the WordPress database. An eventual plugin for WordPress may be possible to install this script, and a few JavaScript files to bootstrap the client engine, but not at this time.

This way of working also means the code can be WordPress transparent, and be used for other site types, and an easier one script conversion for node.js for example. With an install base of over 70 million, and an easy templating CMS, WordPress is a good, but pragmatic choice for this site so far. The other main decisions then all related to the client JavaScript stack. I decided to go for riot as the templating engine as it is lightweight and keeps things modular. Some say ReactJS is good, and it does look it, but I found riot.js which looks just as good, is smaller as an include (Have you seen the page source of a WordPress page?) and has client side rendering easily. And then looking at either underscore.js or lodash.js (I picked underscore), for a basic utility. The next up is the AJAX layer. While WordPress does include jQuery, for independence from WordPress tie ins, this ruled out the OK backbone.js and also a fully custom layer allows me to experiment with bandwidth reduction using data compression as a research opportunity. So I have laid out a collections architecture for myself.

Connecting riot.js to this custom layer should be effectively easy. The only other issue was then a matter of style sheet processing to enhance consistency of style. The excellent less.js was chosen for this. Even though client side rendering was also chosen, which is sub optimal (cached, but uses time, but also allows the possibility and later opportunity of meta manipulation of say colour values as sets, for CSS design compression), but does have freedom from tie in to a particular back end solution (a single PHP script at present). So that becomes the stack in its entire form minus the application. Well I can’t wright everything at once, and the end user experience means the application form must be finalized last, as possibility only remains so this side of implementation. For the record I also consider the collections layer a research opportunity. I’ve seen a good few technologies fail to be anything but good demos, and some which should have remained demos but had good funding. Ye old time to market, and sell a better one later for double the sales. Why not buy one quality one later?

The Inquiry Forms Emailing Now Works

Just set up sendmail on the server using these instructions. Now it’s possible to send inquiries. There will also be other forms as and when relevant. The configuration had an extra step of using a cloud file which behaves as a master for, but this was no big deal. It all took less than ten minutes. A minor complexity is now to forward the response email to somewhere useful to tie up that loose end for later.

RiotEmbed.js Coding Going Well

I’ve done more on riotEmbed today. Developed a system for hash code checking any scripts dynamically loaded. This should stop injection of just any code. There is also some removal of semantic and syntax errors based on ‘this’ and ‘call’, and some confusion between JSON, and JS which can contain roughly JSON, where ‘x: function()’ and ‘function x()’ are not quite the same.

I’ll do some planing of DB schema tomorrow …

The following link is a QUnit testing file I set up, which does no real tests yet, but is good for browser code testing, and much easier than the convoluted Travis CI virtual machine excess.

QUnit Testing

I’ve added in a dictionary acceleration method to the LZW, and called it PON (Packed Object Notation), which is only really effective when used after a BWT as in the pack method. Also some Unicode compression was added, which users of local 64 character sets will like. This leaves a final point in the compression layer of UCS-2 to UTF-8 conversion at the XmlHttpRequest boundary. By default this uses a text interface, and so the 16 bit characters native to JavaScript strings, are UTF-8 encoded and decoded at the eventual net octet streaming. As the PON is expected to be large (when compression is really needed) compared to any other uncompressed JSON, there is an argument to serialize for high dictionary codes (doubling of uncompressed JSON size, and almost a third of PON size), or post UTF to apply SUTF coding (cutting one third off compressed PON without affecting uncompressed JSON). The disadvantage to this is on the server side. The PON is the same, but the uncompressed JSON part will need an encoding and decoding function pair, and hence consume computational and memory buffer resources on the server.

As the aim of this compression layer is to remove load off the server, this is something to think about. The PON itself will not need that encoding taking off or putting on. As much of the uncompressed JSON part will be for SQL where clauses, and as indexes for arrays, the server can be considered ignorant of SUTF, as long as all literals used in the PHP script are ASCII. This is likely for all JSON keys, and literal values. So the second option on the client side of SUTF of the UTF would be effective. Some would say put Gzip on the server, but that would be more server load, which is to be avoided for scaling. I wonder if SUTF was written to accept use of the UInt8Array type?

Some hindsight analysis shows the one third gain is not likely to be realized except with highly redundant data. More realistic data has a wider than 64 dictionary code spread, and the middle byte of a UTF-8 is the easiest one to drop on repeats. The first byte contains the length indication, and as such the code would become much more complex to drop the high page bits, by juggling the lower four bits (0 to 3) in byte two for the lower four bits of byte one. Possibly a self inverse function … Implemented (no testing yet). The exact nature of JSON input to the pack function is next on the list client side, with the corresponding server side requirements of maintaining a searchable store, and distribution replication consistency.

The code spread is now 1024 symbols (maintaining easy decode and ASCII preservation), as anything larger would affect bits four and five in byte two and change the one third saving on three byte UTF-8 code points. There are 2048 dictionary codes before this compression is even used, and so only applies for larger inputs. As the dictionary codes are slightly super linear, I did have an idea to normalize them by subtracting a linear growth overtime, and then “invert the negative bulges” where lower and hence shorter dictionary codes were abnormal to the code growth trend. This is not applicable though as not enough information is easily available in a compressed stream to recover the coding. Well at least it gives something for gzip to have a crack at for those who want to burden the server.

Putting riot.js and less.js on WordPress

You’ll need the insert headers and footers plugin and then put the following in the footer. As the mount is done before the page bottom, there maybe problems with some aspects. In any pages or postings it then becomes possible to incorporate your own tag objects, and refer to them in a type definition. As the service of such “types” has to be free of certain HTML concerns, it’s likely a good idea to set up a github repo to store your types. CSS via less is in the compiler.

<script src=""></script>
<script src=""></script>
<script src=""></script>

Below should appear a timer … The visual editor can make your custom tags disappear.

Produced by the following …

<script src="h" type="riot/tag"></script>

Or with the shortcoder plugin this becomes … in any post …

[sc name="riot" tag="timer"]

And … once in the shortcoder editor

<script src="" type="riot/tag"></script>

This is then the decided dynamic content system, and the back end, and script services are being developed on the following GitHub repository. The project scope is describe there. Client side where possible, and server side for database replication consistency. The tunnel to the database will be the only PHP, and all page returns for ajax style stuff will be as JavaScript returns, so that no static json is sent.

GitHub Repo for Ideas

Next up storing PON …