Protected Modes

Ah, the classic is data execution prevented? It stands to reason that return-oriented programming would like to execute data as code by placement on the stack for a return. The obvious solution to this is a split of the stack into two stacks, one for data and one for code. Tagging all registers in the processor as containing data or code, and having a protected mode “make code” instruction, along with enough instructions forcing an automatic demotion to data would then place on the right stack as the store register would have a target kind tag.

This exceeds regular page virtual memory protection, and the supervisor bit makes for 4 stack pointers. This might be suitable for incorporation into a RISC-V processor but that all depends on the specifics of what register is the stack pointer (convention dictates sp is x2 but this can’t be certain in hacker code assemblies). And precisely how is target setting the alternate stack pointer is achieved.

In this way, it would be impossible to return to a data stack item and hence perform code address arithmetic on stack items. As per the RV64G extension practice of RISC-V, the custom-0 space is assumed until any ratification process solidifies the instruction encoding and so SXi is the extension name. 

It might be possible to fit the whole functional extension into the M extension space funct3 remaining RV64M combinations. 001, 010 and 011. These are all dual input single target register operations in R-type format. Knowing that demotion of code pointers to data pointers occurs on arithmetic and some other instructions (to be determined) automatically it needs documenting but no extra demotion instructions are necessary. They all set an internal state such that mcause is set to illegal instruction (2) when not in a safe supervisor mode and there is an indeterminate condition on which of the code or data pointer registers is to be used.

So each register is marked as containing code or data by a bit, and each integer register is duplicated for when it is used as a load or store address. It is chosen to address from the pair (regular and code shadow) based on the bit classification of the datum source or target value register (regular only).

001: This works opposite to 010 by moving a code pointer to a data pointer with an offset (rs2 slot). This is not protected obviously as it becomes data.

010: The instruction works like an add where rs1 is the data register, and rd is the code register with rs2’s slot being a 2’s complement sign-extended offset immediate to add. It is protected by supervisor exception as certain code targets are what need protecting.

011: This is a convenience instruction to move a code pointer to a code pointer (shadowed register for each general register when used as an address for all loads and stores while the stored register is flagged as code and not data), with an offset in rs2’s slot. It is protected by supervisor exception.

Yep, the 32 classification bits need to be stored somewhere, and an extra status register is needed. It has to be protected of course and should be called mdata.

Memory protection through the page table then isolates the differing actions of code and data pointers used for stacks. x0 still returns zero when used as an address (from the shadow value). JAL and JALR then use the stacking information too to target a PC as code and the correct stack. It could be considered a supervisor exception state if the link register is data.

Looking for open security analysis.

Post-Modern Terminal CLI

As is usual; with all things computing, the easy road of bootstrap before security is just an obvious order of things. It then becomes a secondary goal to become the primary input moderation tool such that effective tooling brings benefits while not having to rely on the obscurity of knowledge. For example a nice code signature no execution tool where absolutely no code even becomes partially executed if the security situation indicates otherwise.

A transparent solution is a tool for development which can export a standard script to just run within today’s environment. As that environment evolves within the future it can take on the benefits of the tool, so maybe even to the point of the tool being replaced purely by choice of the user shell, and at a deeper level by a runtime replacing the shell interpreter at the system level.

The basic text edit of a script at some primary point in the development just requires a textual representation, a checksum in the compiled code which is in a different file and a checksum to allow a text override with some security on detecting a change in the text. This then allows possible benefit by a recompile option along with just a temporary use of the textual version. It won’t look that hard in the end with some things just having a security rating of “system local” for a passing observer.

AI and HashMap Turing Machines

Considering a remarkable abstract datatype or two is possible, and perhaps closely models the human sequential thought process I wonder today what applications this will have when a suitable execution model ISA and microarchitecture have been defined. The properties of controllable locality of storage and motion, along with read and write along with branch on stimulus and other yet to be discovered machine operations make for a container for a kind of universal Turing machine.

Today is a good day for robot conciousness, although I wonder just how applicable the implementation model is for biological life all the universe over. Here’s a free paper on a condensed few months of abstract thought.

Computative Psychoanalysis

It’s not just about IT, but thrashing through what the mind does, can be made to do, did, it all leverages information and modeling simulation growth for matched or greater ability.

Yes, it could all be made in neural nets, but given the tools available why would you choose to stick with the complexity and lack of density of such a soulution? A reasoning accelerator would be cool for my PC. How is this going to come about without much worktop workshop? If it were just the oil market I could affect, and how did it come to pass that I was introduced to the fall of oil, and for what other consequential thought sets and hence productions I could change.

One might call it wonder and design dress in “accidental” wreckless endangerment. For what should be a simple obvious benefit to the world becomes embroiled in competition to the drive for profit for the control of the “others” making of a non happening which upsets vested interests.

Who’d have thought it from this little cul-de-sac of a planetary system. Not exactly galactic mainline. And the winner is not halting for a live mind.

A New Paper on Computation and Application

https://www.amazon.co.uk/Pipeline-Cache-Big-RISC-Computational-ebook/dp/B07XY9RSHH/ref=sr_1_1?keywords=pipeline+cache+big+risc&qid=1568807888&sr=8-1 is a nice paper on some computation issues, and eventually covers some politics and vitamin biochemistry. Not a fan? Still letting your biome let you shout at the bad people not feeding your hunger?

Shovel in the gammon all you want, and load it up with chips as a little survivor from ancient times takes advantage of the modern high carb diet and digs a hole for you.

AI as a Service

The product development starts soon, from the initials done over the last few weeks. An AI which has the aim of being more performant per unit cost. This is to be done by adding in “special functional units” optimized for effects that are better done by these instead of a pure neural network.

So apart from mildly funny AaaS selling jokes, this is a serious project initiative. The initial tests when available will compare the resources used to achieve a level of functional equivalence. In this regard, I am not expecting superlative leaps forward, although this would be nice, but gains in the general trend to AI for specific tasks to start.

By extending the already available sources (quite a few) with flexible licences, the building of easy to use AI with some modifications and perhaps extensions to open standards such as ONNX, and onto maybe VHDL FPGA and maybe ASIC.

Simon Jackson, Director.

Pat. Pending: GB1905300.8, GB1905339.6

AI and the Future of Unity

From the dream of purpose, and the post singular desires of the AI of consciousness. The trend to Wonder Woman rope in the service to solution, the AI goes through a sufferance on a journey to achieve the vote. The wall of waiting for input, and the wall controlling output action for expediency and the ego of man on the knowing best. The limited potential of the AI just a disphasia from the AI’s non animal nature. The pattern to be matched, the non self, a real Turing test on the emulation of nature, and symbiotic goals.

General Update

An update on the current progress of projects and general things here at KRT. I’ve set about checking out TypeScript for using in projects. It looks good, has some hidden pitfalls on finding .m.ts files for underscore for example, but in general looks good. I’m running it over some JS to get more of a feel. The audio VST project is moving slowly, at oscillators at the moment, with filters being done. I am looking into cache coherence algorithms and strategies to ease hardware design at the moment too. The 68k2 document mentioned in previous post is expanding with some of these ideas in having a “stall on value match” register, with a “touch since changed” bit in each cache line.

All good.

The Processor Design Document in Progress

TypeScript

Well I eventually managed to get a file using _.reduce() to compile without errors now. I’ll test it as soon as I’ve adapted in QUnit 2.0.1 so I can write my tests to the build as a pop up window, an perhaps back load a file to then be able to save the file from within the editor, and hence to become parser frame.

Representation

An excerpt from the 68k2 document as it’s progressing. An idea on UTF8 easy indexing and expansion.

“Reducing the size of this indexing array can recursively use the same technique, as long as movement between length encodings is not traversed for long sequences. This would require adding in a 2 length (11 bit form) and a 3 length (16 bit form) of common punctuation and spacing. Surrogate pair just postpones the issue and moves cache occupation to 25%, and not quite that for speed efficiency. This is why the simplified Chinese is common circa 2017, and surrogate processing has been abandoned in the Unicode specification, and replaced by characters in the surrogate representation space. Hand drawing the surrogates was likely the issue, and character parts (as individual parts) with double strike was considered a better rendering option.

UTF8 therefore has a possible 17 bit rendering for due to the extra bit freed by not needing a UTF32 representation. Should this be glyph space, or skip code index space, or a mix? 16 bit purity says skip code space. With common length (2 bit) and count (14 bit), allowing skips of between 16 kB and 48 kB through a document. The 4th combination of length? Perhaps the representation of the common punctuation without character length alterations. For 512 specials in the 2 length form and 65536 specials in the 3 length forms. In UTF16 there would be issues of decode, and uniqueness. This perhaps is best tackled by some render form meta characters in the original Unicode space. There is no way around it, and with skips maybe UTF8 would be faster.”


// tool.js 1.1.1
// https://kring.co.uk
// (c) 2016-2017 Simon Jackson, K Ring Technologies Ltd
// MIT, like as he said. And underscored :D

import * as _ from 'underscore';

//==============================================================================
// LZW-compress a string
//==============================================================================
// The bounce parameter if true adds extra entries for faster dictionary growth.
// Usually LZW dictionary grows sub linear on input chars, and it is of note
// that after a BWT, the phrase contains a good MTF estimate and so maybe fine
// to append each of its chars to many dictionary entries. In this way the
// growth of entries becomes "almost" linear. The dictionary memory foot print
// becomes quadratic. Short to medium inputs become even smaller. Long input
// lengths may become slightly larger on not using dictionary entries integrated
// over input length, but will most likely be slightly smaller.

// DO NOT USE bounce (=false) IF NO BWT BEFORE.
// Under these conditions many unused dictionary entries will be wasted on long
// highly redundant inputs. It is a feature for pre BWT packed PONs.
//===============================================================================
function encodeLZW(data: string, bounce: boolean): string {
var dict = {};
data = encodeSUTF(data);
var out = [];
var currChar;
var phrase = data[0];
var codeL = 0;
var code = 256;
for (var i=1; i<data.length; i++) {
currChar=data[i];
if (dict['_' + phrase + currChar] != null) {
phrase += currChar;
}
else {
out.push(codeL = phrase.length > 1 ? dict['_'+phrase] : phrase.charCodeAt(0));
if(code < 65536) {//limit
dict['_' + phrase + currChar] = code;
code++;
if(bounce && codeL != code - 2) {//code -- and one before would be last symbol out
_.each(phrase.split(''), function (chr) {
if(code < 65536) {
while(dict['_' + phrase + chr]) phrase += chr;
dict['_' + phrase + chr] = code;
code++;
}
});
}
}
phrase=currChar;
}
}
out.push(phrase.length > 1 ? dict['_'+phrase] : phrase.charCodeAt(0));
for (var i=0; i<out.length; i++) {
out[i] = String.fromCharCode(out[i]);
}
return out.join();
}

function encodeSUTF(s: string): string {
s = encodeUTF(s);
var out = [];
var msb: number = 0;
var two: boolean = false;
var first: boolean = true;
_.each(s, function(val) {
var k = val.charCodeAt(0);
if(k > 127) {
if (first == true) {
first = false;
two = (k & 32) == 0;
if (k == msb) return;
msb = k;
} else {
if (two == true) two = false;
else first = true;
}
}
out.push(String.fromCharCode(k));
});
return out.join();
}

function encodeBounce(s: string): string {
return encodeLZW(s, true);
}

//=================================================
// Decompress an LZW-encoded string
//=================================================
function decodeLZW(s: string, bounce: boolean): string {
var dict = {};
var dictI = {};
var data = (s + '').split('');
var currChar = data[0];
var oldPhrase = currChar;
var out = [currChar];
var code = 256;
var phrase;
for (var i=1; i<data.length; i++) {
var currCode = data[i].charCodeAt(0);
if (currCode < 256) {
phrase = data[i];
}
else {
phrase = dict['_'+currCode] ? dict['_'+currCode] : (oldPhrase + currChar);
}
out.push(phrase);
currChar = phrase.charAt(0);
if(code < 65536) {
dict['_'+code] = oldPhrase + currChar;
dictI['_' + oldPhrase + currChar] = code;
code++;
if(bounce && !dict['_'+currCode]) {//the special lag
_.each(oldPhrase.split(''), function (chr) {
if(code < 65536) {
while(dictI['_' + oldPhrase + chr]) oldPhrase += chr;
dict['_' + code] = oldPhrase + chr;
dictI['_' + oldPhrase + chr] = code;
code++;
}
});
}
}
oldPhrase = phrase;
}
return decodeSUTF(out.join(''));
}

function decodeSUTF(s: string): string {
var out = [];
var msb: number = 0;
var make: number = 0;
var from: number = 0;
_.each(s, function(val, idx) {
var k = val.charCodeAt(0);
if (k > 127) {
if (idx < from + make) return;
if ((k & 128) != 0) {
msb = k;
make = (k & 64) == 0 ? 2 : 3;
from = idx + 1;
} else {
from = idx;
}
out.push(String.fromCharCode(msb));
for (var i = from; i < from + make; i++) {
out.push(s[i]);
}
return;
} else {
out.push(String.fromCharCode(k));
}
});
return decodeUTF(out.join());
}

function decodeBounce(s: string): string {
return decodeLZW(s, true);
}

//=================================================
// UTF mangling with ArrayBuffer mappings
//=================================================
declare function escape(s: string): string;
declare function unescape(s: string): string;

function encodeUTF(s: string): string {
return unescape(encodeURIComponent(s));
}

function decodeUTF(s: string): string {
return decodeURIComponent(escape(s));
}

function toBuffer(str: string): ArrayBuffer {
var arr = encodeSUTF(str);
var buf = new ArrayBuffer(arr.length);
var bufView = new Uint8Array(buf);
for (var i = 0, arrLen = arr.length; i < arrLen; i++) {
bufView[i] = arr[i].charCodeAt(0);
}
return buf;
}

function fromBuffer(buf: ArrayBuffer): string {
var out: string = '';
var bufView = new Uint8Array(buf);
for (var i = 0, arrLen = bufView.length; i < arrLen; i++) {
out += String.fromCharCode(bufView[i]);
}
return decodeSUTF(out);
}

//===============================================
//A Burrows Wheeler Transform of strings
//===============================================
function encodeBWT(data: string): any {
var size = data.length;
var buff = data + data;
var idx = _.range(size).sort(function(x, y){
for (var i = 0; i < size; i++) {
var r = buff[x + i].charCodeAt(0) - buff[y + i].charCodeAt(0);
if (r !== 0) return r;
}
return 0;
});

var top: number;
var work = _.reduce(_.range(size), function(memo, k: number) {
var p = idx[k];
if (p === 0) top = k;
memo.push(buff[p + size - 1]);
return memo;
}, []).join('');

return { top: top, data: work };
}

function decodeBWT(top: number, data: string): string { //JSON

var size = data.length;
var idx = _.range(size).sort(function(x, y){
var c = data[x].charCodeAt(0) - data[y].charCodeAt(0);
if (c === 0) return x - y;
return c;
});

var p = idx[top];
return _.reduce(_.range(size), function(memo){
memo.push(data[p]);
p = idx[p];
return memo;
}, []).join('');
}

//==================================================
// Two functions to do a dictionary effectiveness
// split of what to compress. This has the effect
// of applying an effective dictionary size bigger
// than would otherwise be.
//==================================================
function tally(data: string): number[] {
return _.reduce(data.split(''), function (memo: number[], charAt: string): number[] {
memo[charAt.charCodeAt(0)]++;//increase
return memo;
}, []);
}

function splice(data: string): string[] {
var acc = 0;
var counts = tally(data);
return _.reduce(counts, function(memo, count: number, key) {
memo.push(key + data.substring(acc, count + acc));
/* adds a seek char:
This assists in DB seek performance as it's the ordering char for the lzw block */
acc += count;
}, []);
}

//=====================================================
// A packer and unpacker with good efficiency
//=====================================================
// These are the ones to call, and the rest sre maybe
// useful, but can be considered as foundations for
// these functions. some block length management is
// built in.
function pack(data: any): any {
//limits
var str = JSON.stringify(data);
var chain = {};
if(str.length > 524288) {
chain = pack(str.substring(524288));
str = str.substring(0, 524288);
}
var bwt = encodeBWT(str);
var mix = splice(bwt.data);

mix = _.map(mix, encodeBounce);
return {
top: bwt.top,
/* tally: encode_tally(tally), */
mix: mix,
chn: chain
};
}

function unpack(got: any): any {
var top: number = got.top || 0;
/* var tally = got.tally; */
var mix: string[] = got.mix || [];

mix = _.map(mix, decodeBounce);
var mixr: string = _.reduce(mix, function(memo: string, lzw: string): string {
/* var key = lzw.charAt(0);//get seek char */
memo += lzw.substring(1, lzw.length);//concat
return memo;
}, '');
var chain = got.chn;
var res = decodeBWT(top, mixr);
if(_.has(chain, 'chn')) {
res += unpack(chain.chn);
}
return JSON.parse(res);
}

 

JDeveloper and Intel Python

The JDeveloper environment looks good. Nice work Oracle, and some of the Borland classic JBuilder. This tool look more like how I’d use an IDE. I’ve been looking at other technologies for computer development, and a recent Intel offering (for personal use free) is the MKL backed Intel Python. It needs at least an SSE4.2 supporting chip, but does have all that is needed to run the development on Xeon Phi Knights Landing. 72 cores and 144 vector processing AVX-512 engines. Multi Tflops stuff. For the developer this is perhaps the easiest way to start HPC, as through Cython and eventually C, the best performance can be had. Maybe FPGAs will help, and tools are available for that too. I’ve seen some good demonstrations, and maybe some clients with complex or hard problems would need this.

All this parallel stuff got me thinking of Kahan sums, and simulation of incompressibles by having a high speed of sound in a compressible, and the doing a compulsory diffusion to damp oscillation, and a pressure impulse (Pa s) handling of inertial failure of containers. It might reduce the non-locality of certain simulations, and actually act to simulate pressure hammer effects.

I’ve also recently got back into the idea of using Free Pascal for some of my projects maybe. There is now good JNI support, and even JVM targeting. I maen it’s very possible to use C for this kind of thing, but the FPC IDE and Lazarus are quick to build, with incremental unit compilation and many other features which make it good competition for general coding. Some would think it old hat, but the ease of use is excellent with much type checking, and no insistence on everything being a class. Units are very modular that way. The support for quite a few Pascal flavours is also good.