A Modified ElGamal for Passwords Only

It occurred to me g does not need to be made public for ElGamal signing, if the value g^H(m) is stored as the password hash, generated by the client. Also (r, s) can be changed to (r, r^s) to reduce server verification load to one mod power and one precision multiply mod p, and a subtraction equality test. So on the creation of a new password (y, p, g^H(m)) is created, and each log in needs the client to generate a k value to make (r, r^s).

Password recovery would be a little complex, and involve some email backdoor based on maybe using x as a pseudo H(m), and verifying the changes via generation of y. This would of course only set the local browser to have a new password. So maybe a unique (y, p, g^H(m)) per browser local store used. Index the local storage via email address, and Bob’s yer been here before.

Also, the server can crypt any pending view using H(m) as a person’s private key, or the private key as a browser specific personal private key, or maybe even browser key with all clients using same local store x value. All using DH shared secrets. This keeps data in a database a bit more private, and sometimes encrypt to self might be useful.

Is s=H(m)(1-r)(k^-1) mod (p-1) an option? As this sets H(m)=x, eliminating another y, making (p, g^H(m)) sufficient for authentication server storage, and g is only needed if the server needs to send crypts. Along with r=g^k mod p, as some easy sign. (r, s) might have to be used, as r^s could be equated as modinverse(r) for an easy g^H(m) equality, and the requirement to calculate s from r^s is a challenge. So a secure version is not quite as server efficient.

In reality k also has to be computed to prevent (r, s) reuse. This requires the k choice is the servers. Sending k in plaintext defeats the security, so g is needed, to calculate g^z, and so g^(H(m))^z=k on both sides. A retry randomizer to hide s=0, and a protocol is possible.

This surpasses a server md5 of the password. If the md5 is client side, a server capture can log in. If the md5 is server side, the transit intercept is … but a server DB compromise also needs a web server compromise. This algorithm also needs a client side compromise, or email intercept as per.

The reuse of (r, s) can’t be prevented without knowing k, and hence H(m), therefore a shared secret as a returned value implies H(m) knowledge. So one mod power client side, and two server side.

g^k to client.
(g^k)^H(m) to server.
(g^H(m))^k = (g^k)^H(m) tests true.

Signatures are useless as challenge responses. The RSA version would have to involve a signature on H(m) and so need H(m) direct. Also, the function H can be quite interesting to study. The application of client side salt also is not needed on the server side as a decode key, and so not decoded there. DH is so cool like that. And (p-1) having a large factor is easy to arrange in the key generation. And write access is harder, most of the time, to obtain for data.

The storing of a crypt with the g^k used, locks it for H(m) keyed access. This could void data on a password reset, or a browser local storage reset, but does prevent some client’s data leak opertunities, such as DB decrypt keys. This would have multiple crypts of the symmetric key for shared data, but would this significantly reduce the shared key security? It would prevent new users accessing the said secured data without cracking the shared key. A locked share for private threads say?

Spamming your friends with g^salt and g^salt^H(m)?

The first one is a good idea, the second not so much. AI spam encoding g^salt to your and friends accounts. The critical thing is the friend doesn’t get the password. Assuming a bad friend, who registers and gets g^salt to activate, from their own chosen spoof password. An email does get sent to your email, to cancel the friend as an option, and no other problem exists excepting login to a primary mail account. As a spoof maybe would see the option to remove you from your own account.

The primary control email account would then need secondary authentication. Such as only see the spam folder, and know what to open first and in order. For password recovery, this would be ok. For initial registration, it would be first come first served anyhow.

Sallen-Key ZDF design

As part of the VST I am producing, I have designed an SK filter analog where the loading of the first stage by the second is removed to ease implementation. This only affects the filter Q which then has an easy translation of the poles to compensate. Implementing it as CR filter simulation reduces the basic calculation. This is then expanded on by a Zero Delay design, to better its performance.

ZDF filters rely on making a better integral estimate of the voltage over the sample interval to better calculate the linear current charge delta voltage. More of a trapezoid integration than a sum of rectangles. There is still some non-linear charge effects as the voltage affects the current. The current sample out now not known, just then needs a collection of terms to solve for it. Given a high enough sample rate, the error of linearity is small. Smaller than without it, and the phase response is flat due to the error being symmetric on the simulated capacitor voltage, and drive, and not just the capacitor voltage.

The frequency to the correct resistive constant is a good match, and any further error is equivalent to a high frequency gain reduction. There is a maximum frequency of stability introduced in some filters, but this is not one of those. Stability increases with ZDF. The double pole iteration is best done by considering x+dx terms and shifting the dx calculation till later. Almost the output of pole 1 is used to calculate most of the output of pole 2 multiplied by a factor, added on to pole 1 result, and pole 2 result then finally divided. These dx are then added to make the final outputs to memorize.

More VST ideas and RackAFX

Looking into more instrument ideas, with the new Steinberg SDK and RackAFX. This looks good so far with a graphical design interface and a bit of a curve on Getting Visual Studio up to the compiling. A design the UI and then some fill in the blanks with audio render functions. Looks like it will cut development time significantly. Not a C beginner tool, but close.

It’s likely going to be an all in one 32 bit .dll file with midi triggering the built-in oscillation and a use as a filter mode too. Hopefully some different connected processing on the left and right. I want the maximum flexibility without going beyond stereo audio, as I am daw limited. The midi control may even be quite limited, or even not supported in some daw packages. This is not too bad as the tool is FX oriented, and midi is more VSTi.

Na, scratch that, I think I’ll use an envelope follower and PLL to extract note data. So analog and simplifies the plugin. Everything without an easy default excepting the DSP will not be used. There is no reason to make anymore VSTi, and so just VST FX will be done.

Looks like everytime you use visual studio it updates a few gig, and does nothing better. But it does work. There is a need for a fast disk, and quite a few GB of main memory. There is also a need to develop structure in the design process.

The GUI is now done, and next up is the top down class layout. I’ve included enough flexibility for what I want from this FX, and have simplified the original design to reduce the number of controllers. There is now some source to read through, and perhaps some examples. So far so good. The most complex thing so far (assuming you know your way around a C compiler, is the choice of scale on the custom GUI. You can easily get distracted in the RackAFX GUI, and find the custom GUI has a different size or knob scale. It’s quite a large UI I’m working on, but with big dials and a lot of space. Forty dials to be exact and two switches.

I decided on differing processing on each stereo channel, and an interesting panning arrangement. I felt inspired by the eclipse, and so have called it Moon. An excellent WebKnobMan is good at producing dial graphics for custom knobs. The few backgrounds in RackAFX are good enough, and I have not needed gimp or photoshop. I haven’t needed any fully custom control views, and only one enum label changing on twist.

Verdict is, cheap at the price, is not idiot proof, and does need other tools if the built in knobs are not enough. I do wonder if unused resources are stripped from the .dll size. There are quite a few images in there. I did have problems using other fonts, which were selectable but did not display or make an error. Bitmaps would likely be better.

The coding is underway, with the class .h files almost in the bag, and some of the .cpp files for some process basics. A nice 4 pole filter and a waveshaper. Likely I will not bother with sample rate resetting without a reload. It’s possible, but if your changing rate often, you’re likely weird. Still debating the use of midi and vector joy controller. There is likely a user case. Then maybe After this I’ll try a main synth using PDE oscillators. It is quite addictive VST programming.

I wonder what other nice GUI features there are? There is also the fade bypass I need to do, and this maybe joined with the vector joy. And also pitch and mod wheel perhaps. Keeping this as unified control does look a good idea. Project Moon is looking good.

Accounts Year 2016

Accounts are filed. The Account. This year as an experiment the payment options will include Swiss Franc, USD and EURO. There are methods in place to already accept gold and bitcoin, but as expected at this small scale so far, GBP is the currency used by all customers so far. 2017 growth is good compared to 2016, and includes some nice contracts.

Windows Being Shit Again

Ever needed to move “Program Files” to make some space with an easy 32 GB SD? Obviously, Microsoft keeps getting a backhand bribe for filling up internal drives. It turns out a nice utility called Steam Mover does the job of making a shortcut quite nicely. Windows 10 should have allowed this, and so should Windows 7. This is especially bad form when the Windows 10 upgrade installer will not use the SD for gaining the space to do an install. Where did all the GB go? On Visual Studio 2017. A bloaty C compiler.

Even LibreOffice 5 is deleting and doing failed installs. So much for being free. Part of the NSA always on spyware forced upon peoples electric processing bills. Another few GB of keywords to search through, all parasitized off you, for the nationally secure, and stuff you.

So after you check out a free demo rental of your supposed outright buy, and then you’ll have to change to another, as the megabytes increase to do much the same but slower, on faster hardware.

Yep, move a set of bits to some other volume, and all hell breaks loose. It does make you wonder why a specific Office 365 piece of code is running with a file lock when no office documents are open or used.

The Cloud Project

So far I’m up to 5 classes left to fill in

  • SignedPublicKey
  • Server
  • Keys
  • AuditInputStream
  • ScriptOutputStream

They are closely coupled in the package. The main reason for defining a new SignedPublicKey class is that the current CA system doesn’t have sufficient flexibility for the project. The situation with tunnel proxies has yet to be decided. At present the reverse proxy tunnel over a firewall ia based on overiding DNS at the firewall, to route inwards and not having the self as the IP for the host address. Proxy rights will of course be certificate based, and client to client link layer specific.

UPDATE: Server has been completed, and now the focus is on SignedPublicKey for the load/save file access restrictions. The sign8ng process also has to be worked out to allow easy use. There is also some consideration for a second layer of encryption over proxy connection links, and some decisions to be made on the server script style.

The next idea would be a client specific protocol. So instead of server addresses, there would be a client based protocol addressing string. kring.co.uk/file is a server domain based address. This perhaps needs extending.

Cryptography





pub 4096R/8E2EAD58 2017-07-30 Simon Jackson

MIT key server

I’ve been looking into cryptography today and have developed a quartet filter of Java classes which do Diffe-Hellmann 2048 AES key transfer with AES encryption, and ElGamal signing. I chose not to use the shared secret method which uses both private keys, but went for a single secret symmetric AES version, with no back communication.

The main issues were with the signature fail stream close handling, to avoid data corruption via pre-verified data being read as active. An interesting challenge it has been. Other ciphers may have been more logical to some people, and doing the DH modPow by explicit coding was good for the code soul.

I think the discrete logarithm problem is quite secure, and has the square order of 1 prime versus the RSA 2 primes for the same key length. The elliptic curve methods are supposed harder, but the key topology has perhaps some backdoors deep in some later maths. The AES 128 has the lowest key complexity, and is the weakness in the scheme as wrote, so an interleave was made.

Java does make it difficult to build a standard enhanced symmetric cipher to fix this key short fall. Not impossible, but difficult. I may add an intermediate permutation filter to expand the symmetric key length. In the end, I decided on a split symmetric 256 key for AES, one for an outer ECB, and one for an inner CBC. The 16 byte IV was used as a step offset between them for a good 256 bit key effective.

The DH 2048 does not do key exchange with a common p or g, as this is what leads to the x collision over the same p problem. The original plan of public key with less exchange of ephemeral keys, is better. The time solve complexity is similar to a similar RSA. EC cryptography is cool, but still a little not understood, which is ironic for a mathematical field, with a little too much “under” information on how maybe to “find” holes.

The whole concept of perfect forward security, moves the game on to AES cracks based on initial stream content estimates. I’d suggest most of the original key exchange space is pre-computed for a simple 128 bit symmetric crack by now. Out of all the built in Java key types, DH is from my point of view the best for public key cryptography. RSA is cool too for sure, but division is a “relatively” simple operation. There is estimated a 20 bit advantage in the descrete log problem.

DH keys can be decoded to do ElGamal and basic public key secret generation. I’m not sure if DSA as an alternative just needs an extra factor, but Pollard rho triggers a future co p, q effect might be possible. P and Q in DSA are not independent. one is a multiple of the other almost …

Welcome to the national insecurity bank robbery. I know, the state via an affiliated plc, stole 1/4 of my income last year by getting me to destroy evidence.

The artificial limits on the key length and problems leading from that are in the JDK source. Also the deletion of keys from the memory pages when freed back to the OS, may be a problem. Quite a nice programming challenge to do. The Java libs have some strange restrictions on g. View the source.

/* Diffe-Hellmann Cipher AES. (C)2017 K Ring Technologies Ltd.
 A DH symmetric secret (1024 bit) for a 2* AES 128 (256 bit) interleave.
 The 16 byte offset interleave of the ECB is used for the IV slot
 of the CBC.
 */
package uk.co.kring.net;

import java.io.FilterInputStream;
import java.io.FilterOutputStream;
import java.math.BigInteger;
import java.math.SecureBigInteger;
import java.security.KeyPair;
import java.security.PublicKey;
import java.util.Arrays;
import javax.crypto.Cipher;
import javax.crypto.CipherInputStream;
import javax.crypto.CipherOutputStream;
import javax.crypto.interfaces.DHPrivateKey;
import javax.crypto.interfaces.DHPublicKey;
import javax.crypto.spec.IvParameterSpec;

/**
 *
 * @author Simon
 */
public final class DHCipher {
    
    public static final class InputStream extends FilterInputStream {

        public InputStream(java.io.InputStream in, KeyPair pub) throws Exception {
            super(in);
            BigInteger p, sk;
            SecureBigInteger x;
            int bytes;
            byte[] bb;
            DHPublicKey k = (DHPublicKey)pub.getPublic();
            p = k.getParams().getP();
            bytes = (p.bitLength() + 7) / 8;
            DHPrivateKey m =(DHPrivateKey)pub.getPrivate();
            x = new SecureBigInteger(m.getX());
            bb = new byte[bytes];
            in.read(bb);
            sk = new BigInteger(bb);
            if(!sk.abs().equals(sk)) {
                sk = new BigInteger(asLen(bb, bb.length + 1));
            }
            sk = sk.modPow(x, p);
            x.destroy();
            Cipher c = Cipher.getInstance("AES/ECB/PKCS5Padding");
            c.init(Cipher.DECRYPT_MODE, Keys.getAES(sk)[0]);
            in = new CipherInputStream(in, c);
            bb = new byte[24];
            in.read(bb);
            IvParameterSpec iv = new IvParameterSpec(Arrays.copyOfRange(bb, 8, 24));
            c = Cipher.getInstance("AES/CBC/PKCS5Padding");
            c.init(Cipher.DECRYPT_MODE, Keys.getAES(sk)[1], iv);
            in = new CipherInputStream(in, c);
            int i = in.read() % 23;
            in.skip(i);
        }
    }
    
    public static byte[] asLen(byte[] b, int len) {
        byte[] q = new byte[len];
        int j;
        for(int i = len - 1; i >= 0; i--) {
            j = i + b.length - q.length;
            if(j < 0) break;
            q[i] = b[j];
        }
        return q;
    }
    
    public static final class OutputStream extends FilterOutputStream {
        
        public OutputStream(java.io.OutputStream out, PublicKey pub) throws Exception {
            super(out);
            BigInteger y, g, p, sk;
            int bytes;
            byte[] bb;
            DHPublicKey k = (DHPublicKey)pub;
            y = k.getY();
            g = k.getParams().getG();
            p = k.getParams().getP();
            bytes = (p.bitLength() + 7) / 8;
            bb = new byte[bytes];
            Keys.getR().nextBytes(bb);
            BigInteger b = new BigInteger(bb);
            b = b.abs();
            bb = g.modPow(b, p).toByteArray();
            bb = asLen(bb, bytes);
            out.write(bb);//ephermeric
            sk = y.modPow(b, p);
            Cipher c = Cipher.getInstance("AES/ECB/PKCS5Padding");
            c.init(Cipher.ENCRYPT_MODE, Keys.getAES(sk)[0]);
            out = new CipherOutputStream(out, c);
            bb = new byte[24];
            Keys.getR().nextBytes(bb);
            out.write(bb);
            IvParameterSpec iv = new IvParameterSpec(Arrays.copyOfRange(bb, 8, 24));
            c = Cipher.getInstance("AES/CBC/PKCS5Padding");
            c.init(Cipher.ENCRYPT_MODE, Keys.getAES(sk)[1], iv);
            out = new CipherOutputStream(out, c);
            Keys.getR().nextBytes(bb);
            out.write((byte)(bb[0] % 23 + 23 * bb[23]));
            out.write(bb, 1, bb[0] % 23);
        }
    }
}

And the following code for clearing the key. Perfect forward security requires the same q to be used and a extra negotiation step. As it stands it’s not perfect, but as good as RSA, maybe slightly better.

/* Useful. (C)2017 K Ring Technologies Ltd.
 */
package java.math;

import java.util.Arrays;
import java.util.Vector;
import javax.security.auth.DestroyFailedException;
import javax.security.auth.Destroyable;
import uk.co.kring.net.Keys;

/**
 *
 * @author Simon
 */
public final class SecureBigInteger extends BigInteger implements Destroyable {
    
    private boolean d = false;
    private BigInteger ref;
    private static final Vector<SecureBigInteger> m = new Vector<SecureBigInteger>();
    
    private synchronized void handler(BigInteger val) {
        ref = val;
        m.add(this);
        System.arraycopy(val.mag, 0, mag, 0, mag.length);
    }
    
    public SecureBigInteger(BigInteger val) throws Exception {
        super(val.bitLength(), 1, Keys.getR());
        handler(val);
    }

    @Override
    public boolean isDestroyed() {
        return d;
    }

    @Override
    public void destroy() throws DestroyFailedException {
        Arrays.fill(mag, -1);
        m.remove(this);
        boolean in = false;
        Iterable i = (Iterable) m.iterator();
        for(Object x: i) {
            if(((SecureBigInteger)x).ref == ref) in = true;
        }
        if(!in) Arrays.fill(ref.mag, -1);//clear final instance
        d = true;
    }
    
    public void masterDestroy() throws DestroyFailedException {
        Iterable i = (Iterable) m.iterator();
        for(Object x: i) {
            ((SecureBigInteger)x).destroy();
        }
    }
}

There is also the possibility of G exchange, which would allow for calculation of new Y. This would have advantages of instancing a public key set, based on a 1 to 1 crypt role. The cracking of any public key thus only cracks one link and not the full set of peers to a node. In reality, g just alters y, and p does change the crypt. So an exchange of new y is required. There is a potential flaw in this swap if the new p and g are chosen in a cracked domain.

Allowing the client to select g in the server selected p domain is a minor concession to duplication, the server would have to return a new y. Such a thing might go DOS attack, and so should be restricted somewhat. If g is high in repeated factors, then the private key is effectively multiplied up and reduced mod p-1, and g is reduced to a lower base.

BLZW Compression Java

Uses Sais.java with dictionary persist, and initialisation corrections. Also with alignment fix and unused function removed. A 32 bucket context provides an effective 17 bit dictionary key, using just 12 bits, along with the BWT redundancy model. This should provide superior compression of text. Now includes the faster skip decode. Feel free to donate to grow some open source based on data compression and related codecs.





/* BWT/LZW fast wide dictionary. (C)2016-2017 K Ring Technologies Ltd.
The context is used to make 32 dictionary spaces for 128k symbols max.
This then givez 12 bit tokens for an almost effective 16 bit dictionary.
For an approximate 20% data saving above regular LZW.

The process is optimized for L2 cache sizea.

A mod 16 gives DT and EU collisions on hash.
A mod 32 is ASCII proof, and hence good for text.

The count compaction includes a skip code for efficient storage.
The dictionary persists over the stream for good running compression.
64k blocks are used for fast BWT. Larger blocks would give better
compression, but be slower. The main loss is the count compactio storage.

An arithmetic coder post may be effective but would be slow. Dictonary
acceleration would not necessarily be useful, and problematic after the
stream start. A 12 bit code is easy to pack, keeps the dictionary small
and has the sweet spot of redundancy in while not making large rare or
single use symbols.
*/

package uk.co.kring.net;

import java.io.EOFException;
import java.io.Externalizable;
import java.io.FilterInputStream;
import java.io.FilterOutputStream;
import java.io.IOException;
import java.io.ObjectInput;
import java.io.ObjectOutput;
import java.util.HashMap;

/**
 * Created by user on 06/06/2016.
 */
public class Packer {

    public static class OutputStream extends FilterOutputStream implements Externalizable {

        byte[] buf = new byte[4096 * 16];//64K block max
        int cnt = 0;//pointer to end
        int[] dmax = new int[32];
        HashMap<String, Integer> dict;

        public OutputStream(java.io.OutputStream out) {
            super(out);
        }

        @Override
        public void readExternal(ObjectInput input) throws IOException, ClassNotFoundException {
            out = (java.io.OutputStream)input.readObject();
            input.read(buf);
            cnt = input.readChar();
            dict = (HashMap<String, Integer>)input.readObject();
        }

        @Override
        public void writeExternal(ObjectOutput output) throws IOException {
            output.writeObject(out);
            output.write(buf);
            output.writeChar(cnt);
            output.writeObject(dict);
        }

        @Override
        public void close() throws IOException {
            flush();
            out.close();
        }

        private byte pair = 0;
        private boolean two = false;

        private void outputCount(int num, boolean small, boolean tiny) throws IOException {
            if(tiny) {
                out.write((byte)num);
                return;
            }
            if(small) {
                out.write((byte)num);
                pair = (byte)((pair << 4) + (num >> 8));
                if(two) {
                    two = false;
                    out.write(pair);
                } else {
                    two = true;
                }
                return;
            }
            out.write((byte)(num >> 8));
            out.write((byte)num);
        }

        @Override
        public void flush() throws IOException {
            outputCount(cnt, false, false);//just in case length
            char[] count = new char[256];
            if(dict == null) {
                dict = new HashMap<>();
                for(int i = 0; i < 32; i++) {
                    dmax[i] = 256;//dictionary max
                }
            }
            for(int i = 0; i < cnt; i++) {
                count[buf[i]]++;
            }
            char skip = 0;
            boolean first = true;
            char acc = 0;
            char[] start = new char[256];
            for(int j = 0; j < 2; j++) {
                for (int i = 0; i < 256; i++) {
                    if(j == 0) {
                        acc += count[i];
                        start[i] = acc;
                    }
                    if (count[i] == 0) {
                        skip++;
                        if (first) {
                            outputCount(0, false, true);
                            first = false;
                        }
                    } else {
                        if (skip != 0) {
                            outputCount(skip, false, true);
                            skip = 0;
                            first = true;
                        }
                        outputCount(count[i], false, true);
                        count[i] >>= 8;
                    }
                }
                if(skip != 0) outputCount(skip, false, true);//final skip
            }
            int[] ptr = new int[buf.length];
            byte[] bwt = new byte[buf.length];

            outputCount(Sais.bwtransform(buf, bwt, ptr, cnt), false, false);

            //now an lzw
            String sym = "";
            char context = 0;
            char lastContext = 0;
            int test = 0;
            for(int j = 0; j < cnt; j++) {
                while(j >= start[context]) context++;
                if(lastContext == context) {
                    sym += bwt[j];//add a char
                } else {
                    lastContext = context;
                    outputCount(test, true, false);
                    sym = "" + bwt[j];//new char
                }
                if(sym.length() == 1) {
                    test = (int)sym.charAt(0);
                } else {
                    if(dict.containsKey(context + sym)) {
                        test = dict.get(context + sym);
                    } else {
                        outputCount(test, true, false);
                        if (dmax[context & 0x1f] < 0x1000) {//context limit
                            dict.put(context + sym, dmax[context & 0x1f]);
                            dmax[context & 0x1f]++;
                        }
                        sym = "" + bwt[j];//new symbol
                    }
                }
            }
            outputCount(test, true, false);//last match
            if(!two) outputCount(0, true, false);//aligned data
            out.flush();
            cnt = 0;//fill next buffer
        }

        @Override
        public void write(int oneByte) throws IOException {
            if(cnt == buf.length) flush();
            buf[cnt++] = (byte)oneByte;
        }
    }

    public static class InputStream extends FilterInputStream implements Externalizable {

        @Override
        public void readExternal(ObjectInput input) throws IOException, ClassNotFoundException {
            in = (java.io.InputStream)input.readObject();
            input.read(buf);
            idx = input.readChar();
            cnt = input.readChar();
            dict = (HashMap<Integer, String>)input.readObject();
        }

        @Override
        public void writeExternal(ObjectOutput output) throws IOException {
            output.writeObject(in);
            output.write(buf);
            output.writeChar(idx);
            output.writeChar(cnt);
            output.writeObject(dict);
        }

        //SEE MIT LICENCE OF Sais.java

        private static void unbwt(byte[] T, byte[] U, int[] LF, int n, int pidx) {
            int[] C = new int[256];
            int i, t;
            //for(i = 0; i < 256; ++i) { C[i] = 0; }//Java
            for(i = 0; i < n; ++i) { LF[i] = C[(int)(T[i] & 0xff)]++; }
            for(i = 0, t = 0; i < 256; ++i) { t += C[i]; C[i] = t - C[i]; }
            for(i = n - 1, t = 0; 0 <= i; --i) {
                t = LF[t] + C[(int)((U[i] = T[t]) & 0xff)];
                t += (t < pidx) ? 1 : 0;
            }
        }

        byte[] buf = new byte[4096 * 16];//64K block max
        int cnt = 0;//pointer to end
        int idx = 0;
        int[] dmax = new int[32];
        HashMap<Integer, String> dict;

        private boolean two = false;
        private int vala = 0;
        private int valb = 0;

        private int reader() throws IOException {
            int i = in.read();
            if(i == -1) throw new EOFException("End Of Stream");
            return i;
        }

        private char inCount(boolean small, boolean tiny) throws IOException {
            if(tiny) return (char)reader();
            if(small) {
                if(!two) {
                    vala = reader();
                    valb = reader();
                    int valc = reader();
                    vala += (valc << 4) & 0xf00;
                    valb += (valc << 8) & 0xf00;
                    two = true;
                } else {
                    vala = valb;
                    two = false;
                }
                return (char)vala;
            }
            int val = reader() << 8;
            val += reader();
            return (char)val;
        }

        public InputStream(java.io.InputStream in) {
            super(in);
        }

        @Override
        public int available() throws IOException {
            return cnt - idx;
        }

        @Override
        public void close() throws IOException {
            in.close();
        }

        private void doReads() throws IOException {
            if(available() == 0) {
                two = false;//align
                if(dict == null) {
                    dict = new HashMap<>();
                    for(int i = 0; i < 32; i++) {
                        dmax[i] = 256;
                    }
                }
                cnt = inCount(false, false);
                char[] count = new char[256];
                char tmp;
                for(int j = 0; j < 2; j++) {
                    for (int i = 0; i < 256; i++) {
                        count[i] += tmp = (char)(inCount(false, true) << (j == 1?8:0));
                        if (tmp == 0) {
                            i += inCount(false, true) - 1;
                        }
                    }
                }
                for(int i = 1; i < 256; i++) {
                    count[i] += count[i - 1];//accumulate
                }
                if(cnt != count[255]) throw new IOException("Bad Input Check (character count)");
                int choose = inCount(false, false);//read index
                if(cnt < choose) throw new IOException("Bad Input Check (selected row)");
                byte[] build;//make this
                //then lzw
                //rosetta code
                int context = 0;
                int lastContext = 0;
                String w = "" + inCount(true, false);
                StringBuilder result = new StringBuilder(w);
                while (result.length() < cnt) {//not yet complete
                    char k = inCount(true, false);
                    String entry;
                    while(result.length() > count[context]) {
                        context++;//do first
                        if (context > 255)
                            throw new IOException("Bad Input Check (character count)");
                    }
                    if(k < 256)
                        entry = "" + k;
                    else if (dict.containsKey(((context & 0x1f) << 16) + k))
                        entry = dict.get(((context & 0x1f) << 16) + k);
                    else if (k == dmax[context & 0x1f])
                        entry = w + w.charAt(0);
                    else
                        throw new IOException("Bad Input Check (token: " + k + ")");
                    result.append(entry);
                    // Add w+entry[0] to the dictionary.
                    if(lastContext == context) {
                        if (dmax[context & 0x1f] < 0x1000) {
                            dict.put(((context & 0x1f) << 16) +
                                    (dmax[context & 0x1f]++),
                                    w + entry.charAt(0));
                        }
                        w = entry;
                    } else {
                        //context change
                        context = lastContext;
                        //and following context should be a <256 ...
                        if(result.length() < cnt) {
                            w = "" + inCount(true, false);
                            result.append(w);
                        }
                    }
                }
                build = result.toString().getBytes();
                //working buffers
                int[] wrk = new int[buf.length];
                unbwt(build, buf, wrk, buf.length, choose);
                idx = 0;//ready for reads
                if(!two) inCount(true, false);//aligned data
            }
        }

        @Override
        public int read() throws IOException {
            try {
                doReads();
                int x = buf[idx++];
                doReads();//to prevent avail = 0 never access
                return x;
            } catch(EOFException e) {
                return -1;
            }
        }

        @Override
        public long skip(long byteCount) throws IOException {
            long i;
            for(i = 0; i < byteCount; i++)
                if(read() == -1) break;
            return i;
        }

        @Override
        public boolean markSupported() {
            return false;
        }

        @Override
        public synchronized void reset() throws IOException {
            throw new IOException("Mark Not Supported");
        }
    }
}

Disection of the Roots of the Mass Independent Space Equation

(v^2) v ‘ ‘ ‘ −9v v ‘ v ‘ ‘ 12(v ‘ ^3) (1−v^2/c^2)v ‘ (wv)^2
3 Constants 2 Constants 1 Constant 1 Constant
Square Power Linear Power Cubic Power Square and Quartic Power
3 Root Pairs 2 Roots 1 Root and 1 Root Pair 1 Root Pair and 2 Root Pairs
Energy and Force of Force Momentum, Force and Velocity of Force Cube of Force Force Energy
Potential Inertial Term Kinetic Inertial Term Strong Term Relativistic Force Energy Coupling
Gravity Dark Strong Weak EM

The fact there are 4 connected modes, as it were, imply there are 6 cross overs between modes of action, indicating that one term can be stimulated to convert into another term. The exact equilibrium points can be set as 6 differential equation forms, with some further analysis required of stable phase space bounds, and unstable phases at which to alter the balance to have a particular effect. Installing a constant (or function) of proportionality in each of the following balance equations would in effect allow some translation of one term ‘resonance’ into another.

v v ‘ ‘ ‘=−9 v ‘ v ‘ ‘ 3 Const and 1 root point
(v^2) v ‘ ‘ ‘=12(v ‘ ^3) 3 Const and 6 root points
v ‘ ‘ ‘=(1−v^2/c^2)v ‘ w^2 3 Const and 2 root points
−9v v ‘ ‘=12(v ‘ ^2) 2 Const and 2 root points
−9 v ‘ ‘=(1−v^2/c^2)(w^2) v 2 Const and 2 root points
12(v ‘ ^2)=(1−v^2/c^2)(wv)^2 1 Const and 12 root points

Another interesting point is 3 of the 6 are independent of w (omega mass oscillation frequency), and also by implication relativistic dependence on c.




Current Contracts

My current contract is interesting work, and only spoiled by the accounting trend of pay everybody as late as possible. It looks as though I’ll up my rates to account and manage invoice factorization. It does no good to act as a credit source to a company which has limited status, and yet either has the resources to pay in time, or is miserly in the period before receivership. Either way such companies are not my friend, and no good to be associated with. Credit control …

Well a bit delayed, but payments are arriving, and I’m investing in some electronic product design.

The 3D Flavour Tensor in Analogue to the 4D of Einstein, for a 3D, 4D Curvature in Particle Physics

I like to keep updated about particle physics and LHC things, to quite an advanced level. My interest is in fields and their previous engineering value in radio waves and electronics in general. It makes sense to move to a tensor algebra in the 2+1 charge space, just as was done for the theory of gravitation. In some sense the conservation of acceleration becomes a conservation of net mapped curvature and it becomes funny via Noether’s Theorem.

CP violation as a horizon delta of radius of curvature from the “t” distance is perhaps relevant phrased as a moment of inertia in the 2+1, and its resultant geometric singular forms. This does create the idea of singular forms in the 2+1 space orbiting (or perhaps more correctly resonating) in tune with singularities in the 3+1 space. This interconnection entanglement, or something similar is perhaps connected to the “weak phase”.

So a 7D total space-time, with differing invariants in the 3D and 4D parts. The interesting thing from my prospective is the prediction of a heavy graviton, and conservation of acceleration. The idea that space itself holds its own shape without graviton interaction, and so conserves acceleration, while the heavy graviton can be a short range force which changes the curvature. The graviton then becomes a mediator of jerk and not acceleration. The graviton, being heavy would also travel slower than light. Gravity waves would then not necessarily need graviton exchange.

Quantization of theories has I think in many ways gone too far. I think the big breaks of the 21st century will be turning quantized bulk statistics into unquantized statistics, with quantization applied to only some aspects of theories. The implication is that dark matter is bent spacetime, without matter being present to emit gravitons. In this sense I predict it is not particulate.

So 7D and a differential phase space coordinate for each D (except time) gives a 13D reality. The following is an interesting equation I arrived at at one point for velocity solutions to uncertainty. I did not incorporate electromagnetism, but it’s interesting in the number of solutions, or superposition of velocity states as it were. The w being constant in the assumption, but purtubative expansion in it may be interesting. The units of the equation are conveniently force. A particle observing another particle would also be moving such, and the non linear summation for the lab rest frame of explanation might be quite interesting.

(v^2) v ‘ ‘ ‘−9v v ‘ v ‘ ‘+12(v ‘ ^3)+(1−v^2/c^2)v ‘ (wv)^2=0

With ‘ representing differential w.r.t. time notation. So v’ is acceleration and v” is the jerk. I think v”’ is called the jounce for those with a mind to learn all the Js. An interesting equation considering the whole concept of uncertain geometry started from an observation that relative mass was kind of an invariant, mass oscillation, although weird with RMS mass and RMS energy conservation, was perhaps a good way of parameterizing an uncertainty “force” proportional to the kinetic energy momentum product. As an addition it was more commutative as a tensor algebra. Some other work I calculated suggests dark energy is conservation of mass times log of normalized velocity, and dark matter could be conserved acceleration with gravity and the graviton operating to not bend space on density, but bend space through a short distance acting heavy graviton. Changes in gravity could thus travel slower than light, and an integral with a partial fourth power fraction could expand into conserved acceleration, energy, momentum and mass information velocity (dark energy) with perhaps another form of Higgs, and an uncertainty boson (spin 1) as well.

So really a 13D geometry. Each velocity state in the above mass independent free space equation above is an indication of a particle of differing mass. A particle count based on solutions. 6 quarks and all. An actual explanation for the three flavours of matter? So assuming an approximate linear superposable solution with 3 constants of integration, this gives 6 parameterized solutions from the first term via 3 constants and the square being rooted, The second tern involves just 2 of the constants for 2 possible offsets, and the third term involves just one of the constants, but 3 roots with two being in complex conjugation. The final term involves just one of the constants, but an approximation to the fourth power for 4 roots, and disappearing when the velocity is the speed of light, and so is likely a rest mass term.

So that would likely be a fermion list. A boson list would be in the boundaries at the discontinuities between those solutions, with the effective mass of the boson controlled by the expected life time between the states, and the state energy mismatch. Also of importance is how the equation translates to 4D, 3D spacetime, and the normalized rotational invariants of EM and other things. Angular momentum is conserved and constant (dimensionless in uncertain geometry),

Assuming the first 3 terms are very small compared to the last term, and v is not the speed of light. There would have to be some imaginary component to velocity, and this imaginary would be one of the degrees of freedom (leading to a total of 26). Is this imaginary velocity consistent with isospin?

Yang–Mills Existence and Mass Gap (Clay Problem)

If mass oscillation is proved to exist, then the mass gap can never be proved to be greater than zero as the mass must pass through zero for oscillation. This does exclude the possibility of complex mass oscillation, but this is just mass shrinkage (no eventual gap in the infinite time limit), or mass growth, and hence no minimum except in the big bang.

The 24 degrees of freedom on the relativistic compacted holographic 3D for the 26D string model, imply with elliptic functions, a 44 fold way. This is a decomposition into 26 sporadic elliptic patterns, and 18 generational spectra patterns. With the differential equation above providing 6*2*(2+1) combinations from the first three terms, and the 3 constants of integration locating in “colour space” through a different orthogonal basis. Would provide 24 apparent solution types, with 12 of them having a complex conjugation relation as a pair for 36. If this is the isospin solution, then the 12 fermionic solutions have all been found. That leaves the 12 bosonic solutions (the ones without a conjugate in the 3rd term generative), with only 5 (or if a photon is special 4) having been found so far. If the bosonic sector includes the dual rooting via the second term for spin polarity, then of the six (with the dual degenerates cancelled), two more are left to be found if light is special in the 4th term.

This would also leave 8 of the 44 way in a non existent capacity. I’d maybe focus on them being gluons, and consider the third still to be found as a second form of Higgs. OK.

Displacement Currents in Colour Space

Maybe an interesting wave induction effect is possible. I’m not sure what the transmitter should be made of. The ABC modulation may make it a bit “alternate” near the field emission. So not caused by bosons in the regular sense, more the “transition bosons” between particle states. The specific transitions between energy states may (although it’s not certain), pull the local ABC field in a resonant or engineered direction. The actual ABC solution of this reality has to have some reasoning for being stable for long enough. This does not imply though that no other ABC solutions act in parallel, or are not obtainable via some engineering means.

VS Code and Elm

I’ve been looking into doing JS trans-compiled languages recently. The usual suspects popped up. ClosureScript, Elm, TypeScript, and maybe a few others. This had the unexpected effect of needing the VS Code software as some of the plugins do not yet work with VS Community. I opened up some TypeScript I had wrote recently, and found the way “require” is used for loading is not recognised in VS Code. Strange, and it might cause problems with passing on code to others.

I looked into Elm which is a Haskell for JS. It looks quite good, and I’ve downloaded the kit. I’ll let you know if I start using it big. Closure is Scheme or LISP almost. It seems to have little editor support compared to Elm, and I’d prefer to use Elm over Closure. I already have some libraries downloaded for functional extensions to JS, and some .d.ts descriptors too for some. The main reason I’ve never used Haskell is the large GHC binary size. The idea of using JS as the VM is good. It does however dump about 6000 lines of JS code for a hello world. I haven’t tested if this is per module. I understand Elm can do very fast HTML rendering though, so something to look into.

There’s also Haxe of course, and plenty of plain vanilla JS functional programming modules, including some like RQ, for threading control. Some nice Monad libraries, and good browser support. I also like the TinyMCE. It’s quite a classic. For the toolkit, Bootstrap.js is the current best with all the needed features of a modern looking site.

Beware much ado about category theory, and things like the continuation monad can do all sequential processing … of course from the context of writing it in a sequential language … blah, blah, stored state, pretend there’s none, blah, blah, monad, blah, delay output by wrapper, blah. Ok, well it is true I’m 46, but you young coders out there should take some of the symbology with quite a big pinch of salt, and maybe have a more interesting look at things like the Y combinator. It’s kind of what Mathematica would call Hold[] but with more monad blah for what is really group theory.

TypeScript in Visual Studio Community 2017

Just loaded up the VS 2017 community release. The TS 2.1 features include strong types relating to null and undefined. A bit annoying, but none the less it does force some decent console logging of salient error potential. I’ve found that many apparent error conditions are being removed. If only I could find a way to stop the coercion in JS, such that it is. It is one of the things I never liked about JS. I’d prefer an explicit cast every time.

General Update

An update on the current progress of projects and general things here at KRT. I’ve set about checking out TypeScript for using in projects. It looks good, has some hidden pitfalls on finding .m.ts files for underscore for example, but in general looks good. I’m running it over some JS to get more of a feel. The audio VST project is moving slowly, at oscillators at the moment, with filters being done. I am looking into cache coherence algorithms and strategies to ease hardware design at the moment too. The 68k2 document mentioned in previous post is expanding with some of these ideas in having a “stall on value match” register, with a “touch since changed” bit in each cache line.

All good.

The Processor Design Document in Progress

TypeScript

Well I eventually managed to get a file using _.reduce() to compile without errors now. I’ll test it as soon as I’ve adapted in QUnit 2.0.1 so I can write my tests to the build as a pop up window, an perhaps back load a file to then be able to save the file from within the editor, and hence to become parser frame.

Representation

An excerpt from the 68k2 document as it’s progressing. An idea on UTF8 easy indexing and expansion.

“Reducing the size of this indexing array can recursively use the same technique, as long as movement between length encodings is not traversed for long sequences. This would require adding in a 2 length (11 bit form) and a 3 length (16 bit form) of common punctuation and spacing. Surrogate pair just postpones the issue and moves cache occupation to 25%, and not quite that for speed efficiency. This is why the simplified Chinese is common circa 2017, and surrogate processing has been abandoned in the Unicode specification, and replaced by characters in the surrogate representation space. Hand drawing the surrogates was likely the issue, and character parts (as individual parts) with double strike was considered a better rendering option.

UTF8 therefore has a possible 17 bit rendering for due to the extra bit freed by not needing a UTF32 representation. Should this be glyph space, or skip code index space, or a mix? 16 bit purity says skip code space. With common length (2 bit) and count (14 bit), allowing skips of between 16 kB and 48 kB through a document. The 4th combination of length? Perhaps the representation of the common punctuation without character length alterations. For 512 specials in the 2 length form and 65536 specials in the 3 length forms. In UTF16 there would be issues of decode, and uniqueness. This perhaps is best tackled by some render form meta characters in the original Unicode space. There is no way around it, and with skips maybe UTF8 would be faster.”


// tool.js 1.1.1
// https://kring.co.uk
// (c) 2016-2017 Simon Jackson, K Ring Technologies Ltd
// MIT, like as he said. And underscored 😀

import * as _ from 'underscore';

//==============================================================================
// LZW-compress a string
//==============================================================================
// The bounce parameter if true adds extra entries for faster dictionary growth.
// Usually LZW dictionary grows sub linear on input chars, and it is of note
// that after a BWT, the phrase contains a good MTF estimate and so maybe fine
// to append each of its chars to many dictionary entries. In this way the
// growth of entries becomes "almost" linear. The dictionary memory foot print
// becomes quadratic. Short to medium inputs become even smaller. Long input
// lengths may become slightly larger on not using dictionary entries integrated
// over input length, but will most likely be slightly smaller.

// DO NOT USE bounce (=false) IF NO BWT BEFORE.
// Under these conditions many unused dictionary entries will be wasted on long
// highly redundant inputs. It is a feature for pre BWT packed PONs.
//===============================================================================
function encodeLZW(data: string, bounce: boolean): string {
var dict = {};
data = encodeSUTF(data);
var out = [];
var currChar;
var phrase = data[0];
var codeL = 0;
var code = 256;
for (var i=1; i<data.length; i++) {
currChar=data[i];
if (dict['_' + phrase + currChar] != null) {
phrase += currChar;
}
else {
out.push(codeL = phrase.length > 1 ? dict['_'+phrase] : phrase.charCodeAt(0));
if(code < 65536) {//limit
dict['_' + phrase + currChar] = code;
code++;
if(bounce && codeL != code - 2) {//code -- and one before would be last symbol out
_.each(phrase.split(''), function (chr) {
if(code < 65536) {
while(dict['_' + phrase + chr]) phrase += chr;
dict['_' + phrase + chr] = code;
code++;
}
});
}
}
phrase=currChar;
}
}
out.push(phrase.length > 1 ? dict['_'+phrase] : phrase.charCodeAt(0));
for (var i=0; i<out.length; i++) {
out[i] = String.fromCharCode(out[i]);
}
return out.join();
}

function encodeSUTF(s: string): string {
s = encodeUTF(s);
var out = [];
var msb: number = 0;
var two: boolean = false;
var first: boolean = true;
_.each(s, function(val) {
var k = val.charCodeAt(0);
if(k > 127) {
if (first == true) {
first = false;
two = (k & 32) == 0;
if (k == msb) return;
msb = k;
} else {
if (two == true) two = false;
else first = true;
}
}
out.push(String.fromCharCode(k));
});
return out.join();
}

function encodeBounce(s: string): string {
return encodeLZW(s, true);
}

//=================================================
// Decompress an LZW-encoded string
//=================================================
function decodeLZW(s: string, bounce: boolean): string {
var dict = {};
var dictI = {};
var data = (s + '').split('');
var currChar = data[0];
var oldPhrase = currChar;
var out = [currChar];
var code = 256;
var phrase;
for (var i=1; i<data.length; i++) {
var currCode = data[i].charCodeAt(0);
if (currCode < 256) {
phrase = data[i];
}
else {
phrase = dict['_'+currCode] ? dict['_'+currCode] : (oldPhrase + currChar);
}
out.push(phrase);
currChar = phrase.charAt(0);
if(code < 65536) {
dict['_'+code] = oldPhrase + currChar;
dictI['_' + oldPhrase + currChar] = code;
code++;
if(bounce && !dict['_'+currCode]) {//the special lag
_.each(oldPhrase.split(''), function (chr) {
if(code < 65536) {
while(dictI['_' + oldPhrase + chr]) oldPhrase += chr;
dict['_' + code] = oldPhrase + chr;
dictI['_' + oldPhrase + chr] = code;
code++;
}
});
}
}
oldPhrase = phrase;
}
return decodeSUTF(out.join(''));
}

function decodeSUTF(s: string): string {
var out = [];
var msb: number = 0;
var make: number = 0;
var from: number = 0;
_.each(s, function(val, idx) {
var k = val.charCodeAt(0);
if (k > 127) {
if (idx < from + make) return;
if ((k & 128) != 0) {
msb = k;
make = (k & 64) == 0 ? 2 : 3;
from = idx + 1;
} else {
from = idx;
}
out.push(String.fromCharCode(msb));
for (var i = from; i < from + make; i++) {
out.push(s[i]);
}
return;
} else {
out.push(String.fromCharCode(k));
}
});
return decodeUTF(out.join());
}

function decodeBounce(s: string): string {
return decodeLZW(s, true);
}

//=================================================
// UTF mangling with ArrayBuffer mappings
//=================================================
declare function escape(s: string): string;
declare function unescape(s: string): string;

function encodeUTF(s: string): string {
return unescape(encodeURIComponent(s));
}

function decodeUTF(s: string): string {
return decodeURIComponent(escape(s));
}

function toBuffer(str: string): ArrayBuffer {
var arr = encodeSUTF(str);
var buf = new ArrayBuffer(arr.length);
var bufView = new Uint8Array(buf);
for (var i = 0, arrLen = arr.length; i < arrLen; i++) {
bufView[i] = arr[i].charCodeAt(0);
}
return buf;
}

function fromBuffer(buf: ArrayBuffer): string {
var out: string = '';
var bufView = new Uint8Array(buf);
for (var i = 0, arrLen = bufView.length; i < arrLen; i++) {
out += String.fromCharCode(bufView[i]);
}
return decodeSUTF(out);
}

//===============================================
//A Burrows Wheeler Transform of strings
//===============================================
function encodeBWT(data: string): any {
var size = data.length;
var buff = data + data;
var idx = _.range(size).sort(function(x, y){
for (var i = 0; i < size; i++) {
var r = buff[x + i].charCodeAt(0) - buff[y + i].charCodeAt(0);
if (r !== 0) return r;
}
return 0;
});

var top: number;
var work = _.reduce(_.range(size), function(memo, k: number) {
var p = idx[k];
if (p === 0) top = k;
memo.push(buff[p + size - 1]);
return memo;
}, []).join('');

return { top: top, data: work };
}

function decodeBWT(top: number, data: string): string { //JSON

var size = data.length;
var idx = _.range(size).sort(function(x, y){
var c = data[x].charCodeAt(0) - data[y].charCodeAt(0);
if (c === 0) return x - y;
return c;
});

var p = idx[top];
return _.reduce(_.range(size), function(memo){
memo.push(data[p]);
p = idx[p];
return memo;
}, []).join('');
}

//==================================================
// Two functions to do a dictionary effectiveness
// split of what to compress. This has the effect
// of applying an effective dictionary size bigger
// than would otherwise be.
//==================================================
function tally(data: string): number[] {
return _.reduce(data.split(''), function (memo: number[], charAt: string): number[] {
memo[charAt.charCodeAt(0)]++;//increase
return memo;
}, []);
}

function splice(data: string): string[] {
var acc = 0;
var counts = tally(data);
return _.reduce(counts, function(memo, count: number, key) {
memo.push(key + data.substring(acc, count + acc));
/* adds a seek char:
This assists in DB seek performance as it's the ordering char for the lzw block */
acc += count;
}, []);
}

//=====================================================
// A packer and unpacker with good efficiency
//=====================================================
// These are the ones to call, and the rest sre maybe
// useful, but can be considered as foundations for
// these functions. some block length management is
// built in.
function pack(data: any): any {
//limits
var str = JSON.stringify(data);
var chain = {};
if(str.length > 524288) {
chain = pack(str.substring(524288));
str = str.substring(0, 524288);
}
var bwt = encodeBWT(str);
var mix = splice(bwt.data);

mix = _.map(mix, encodeBounce);
return {
top: bwt.top,
/* tally: encode_tally(tally), */
mix: mix,
chn: chain
};
}

function unpack(got: any): any {
var top: number = got.top || 0;
/* var tally = got.tally; */
var mix: string[] = got.mix || [];

mix = _.map(mix, decodeBounce);
var mixr: string = _.reduce(mix, function(memo: string, lzw: string): string {
/* var key = lzw.charAt(0);//get seek char */
memo += lzw.substring(1, lzw.length);//concat
return memo;
}, '');
var chain = got.chn;
var res = decodeBWT(top, mixr);
if(_.has(chain, 'chn')) {
res += unpack(chain.chn);
}
return JSON.parse(res);
}

 

68k Continued …

A Continuation as it was Getting Long

The main thing in any 64 bit system is multi-processing. Multi-threading has already been covered. The CAS instruction is gone, and cache coherence is a big thing. So a supervisor level mutex? This is an obvious need. The extra long condition code register? How about a set of bits to set, and a stall if not zero? The bits could count down to zero over a number of cycles, leaving an opportunity to spin lock any memory location. Putting it in the status or condition code registers avoids the chip level cache shuffle. A non supervisor version would help user tasks. This avoids the need for atomic operations to a large extent, enough to not need them.

The fact a cache can reset pre-filled with “high memory” garbage, and not need empty bits, saves a little, but does need a little care on the compliance of boot sequence. A write back to the cache causes a cross core invalidate in most cache designs, There is an argument to set some status bit for ease of implementation. Resetting just the cache line would work, but remove a small section of memory from the 64 bit address space. A data invalidation queue would be useful to assist in the latency of reset to some synchronous opportunity, the countdown stall assisting in queue size management. As simultaneous write is a race condition, and a fail, by simultaneous deletion, the chip level mutexes must be used correctly. For the case where a cache load has to be performed, a double mutex count lock might have to be done. This infers that keeping the CPU ID somewhere to speed the second mutex lock might be beneficial.

Check cached, maybe repeat, set global, check cached, check global, maybe repeat, set cached, do is OK. A competing lock would fail maybe on the set cached if a time slice occurred just before it. An interrupt delay circuit would be needed for a number of instructions when the global is checked. The common access to the value either sets a stall timer or an interrupt stall timer, or a common timer register, with both behaviours. A synchronization window. Of course a badly written code piece could just set the cached, and ruin everything. But write range bounding would prevent this.

The next issue would be to sort out duplicating a read copy of a cache line into a local cache. This is so likely to be shared memory with the way software should be working. No process shares a cache line otherwise by sensible design of software. A read should get a clone from memory (to not clog a cache transfer bus if the other cache has not written). A cache should check for another cache written dirty, and send a read copy. A write should cause a delete invalidation on the other caches. If locks are correctly written this will preserve all writes. The cache bus only then has to send dirty copies, and in validations. Packet formats are then just an address, an RW bit, and the data width of a cache line (the last part just on the return bus), and the RW bit on the return bus is not used.

What happens when a second write happens, and a read only copy is in transit? It is invalid on arrival, but not responsible for any write back. This is L2 cache here. The L1 cache can also be data invalidated, but can stand the read delay. Given the write invalidate strategy, the packet in transit can be turned into an invalidate packet. The minor point is the synchronous assignment to the cache of the read copy at the same bus cycle edge as the write. This just needs a little logic to prevent this by “special address” forwarding. A sort of cancel on execute as it were.

It could be argued that sending over a read only copy is a bad idea and wastes by over connecting the caches. But to not send it would result in a L3 fetch of something not yet written to L3 yet, or the other option would be to stall based on address until it exits the cache on the other processor from under use of the associative address. That could take a very long time. The final issue is closing the mutex. The procedure is the same as opening, but using a different value to set cached. The mutex needs to be flushed? Nope, as the check cached will send a read only copy, and the set cached will invalidate the other dirty.

I think that makes for a minimal logic L2 cache. The L3 cache can be shared, and the T and S caches do not need coherence. Any sensible code would not need this. The D cache need invalidation only. The I cache should not need anything. When data is written to memory for later use as instructions, there is perhaps an issue, also with self modifying code, which frankly should be ignored as an issue. The L2 cache should get written with code, and a fetch should get a transferred read only copy. There would be no expectation of another write to the same memory location after scheduling execution.

There should be some cache coherence for DMA. There should be no expectation of write to a DMA block before the DMA output transfer is complete. The DMA therefore needs its own L2 cache “simulation” to receive read only updates, and to invalidate when DMA does an input source read. It is only slow off chip IO which necessitates a flush to L3 and main memory. Such things if handled well can allow the write back queue to only have elements entered onto it when hitting the L2 eviction cache. Considering that there is a block of memory which signals cache empty, it makes sense to just pass this write directly out, and latch for immediate continuation of execution, and stall only if the external bus cycle is not complete on a second write to those addresses. The input read on those addresses have to stall by default if a simplistic ideology is taken.

A more complex method is to indicate a pre-fetch. In a similar way to the 1 item buffer. I hope your IO does not read trigger events (unlikely, but write triggering is not unheard of). A delay 1 item buffer does help with a bit of for knowledge, and the end of bus cycle latching into this delay slot can be used to continue processing and routine setup. Address latches internally help with the clock domain crossing. The only disadvantage to this is that the processor decides the memory mapped device layout. It would be of benefit to shuffle this slow bus over a serial protocol. This makes an external PLA, micro-controller or FPGA suitable for running the slow bus, and keeps PIN count on the main CPU lower, allowing main memory to be placed and routed closer to the core CPU “silicon chip” die.

L4 Cache

The concept of cloud as galactic cache is perhaps a thing that some are new to. There is the sequential stride static column idea, which is good for some processing and in effect gives the I cache the highest performance, and shows the sequential stride the best. For tasks that flood the D cache, which is most when heavy optimization is used, The question becomes “is there a need for associative L4 cache off chip?” for an effective use of some MB of static single cycle RAM? With a bus size of 32 bits data, and a 64 bit addressing system, the tag would exceed the data in consuming the SRAM. If the memory bus is just 32 bit addressing, with some DMA SD card trickery for the high word, this still makes the tag large, but less than 50% of the SRAM usage. Burst mode in this sense is auto increment on the low addresses within the SRAM chip to stop copper trace charge power wastage, and a tag check wait state and DRAM access generator. The fact that DRAM is accessed in blocks makes the tag shrink further.

Yes it’s true, DRAM should have “some” associative SRAM as well as static column banks. But the net effect on performance is minimal. It’s more of a L3 eviction cache extender. Things down to the RAM disk has “read only” or even “no action” system files on it for all the memory, makes file buffers actually be files. Most people would not appreciate this level of detail, but it does however allow for easy contraction and expansion of the statically sized RAM disk. D cache thrashing is the problem to solve. Make it bigger. The SDRAM issue can be solved by putting a fair amount of it there, and placing a cloud in higher (or different address space) memory. The SD card interface is then the location of the network interface. Each part of memory is then divided into 3 parts at this level. A direct part, and associative part assisting another cache level and a tag part. the tag part and the associative assist for an interleave partition. Browser caches are a system level feature, not an issue for application developers. The flush cache is “new disk, new net” and beyond.

How to present a file browser picture of the web? FTP sort of did it for files. And a bookmark and search view seem like a good way of starting out. There maybe should be an unknown folder with some entropicly selected default folders based on wordage, and the web becomes seen in scope. At the level of a site index, the “tool type” should render a page view of the “folder”. The source view may also be relevant for some. People have seen this before, or close to it, and made some interesting research tools. I suppose it does not add up to profit per say, but has much more context in robot living allowance world.

This is the end of part II, and maybe more …

 

  • Addressing (An, Dn<<{size16*SHS}.{DS<<size}, d8/24) so that the 2 bit DS field indexes one of 4 .W or larger.fields, truncated to .B, .W, .L or .Q with apparently even more options to spare. If SHS == 3 for example the DS > 1 have no extra effect. I’ll think on this (18th Feb 2017).
  • If SHS == 2 then DS == 3 has no effect. If SHS == 1 then DS == 3 does have an effect.
  • This can provide for 3 extra addressing modes not yet developed.

68k2-PC#d12 It’s got better! A fab addressing mode. More then 12 bits of embed-able opcode space remains in a 32 bit wide opcode extension, and almost all the 16 bit opcodes are used (all the co-processor F line slots). With not 3, but 10 extra addressing modes.

Where the 68k ISA went Wrong

An AMIGA valentine’s special.

With much hindsight it is possible to analyse where the micro computer scene and the processor market overlapped to allow Intel to overtake the market share. The selection of the EC range in the Amiga was put forward as cheaper, which although true, was indicative of the lack of micro computer use of floating point at that moment in time. The RISC ARM had some fast integer multiply and bit shuffle performance about the same era, which convince the use of some of the $Fxxx for all kind of MMU and FPU mish-mash, all not for micro users. Here are some classics of the day.

No one really needs floating point – The kids didn’t, it worked in software, and GPU cards do that kind of thing today if you really NEED it.
Memory management? Who needs it? – The kids didn’t. Why would the OS not sandbox? OK, I get it, Microsoft wrote Windows and needed the general protection fault. But was not that Intel? Surely it would be better to just have read and write protection trapping on certain addressing mode sequences only. Perhaps a bit array of jump entry points, and a forced check of address modes upto the “possible invalidation point”, with restrictions on “doing things” being a sand boxed software process. TLB based virtual memory is powerful, but “clouding up” does show that library calls come in on some level, and overlays just needed easier configuration management, and perhaps a pre-fetch gain.
D cache and I cache … lovely
Why do people use switch statements on highly index-able emulation code? – Why? If going the JIT way, then a flat JIT memory recompile, and a index of jump address mappings (differing encode lengths) would look good. But dynamic jumps would need a jump index table anyhow in any rational coding strategy. The best place to put a user mode trap? Any fixed jump can be resolved, and any computed jump can be sparse treed and block loaded on demand.
Given the hindsight, a more modern fast math multi wide vector unit would be a better addition than an FPU, with a fast trap jump flag in both the user and supervisor state, to quickly run some vector emulation code. Yes, a vector enable flag in user, and a legacy flag in supervisor mode. I don’t like the 3 bit co-processor address stuff either. Those bits would be better used for other things. There’s a whole $Axxx block to mess with for MMU style things.

For example if it was float register destination, then the standard instruction format would work, and 8 operation codes would be possible, with minimal impact on effective address decoding. The fact that saving floating point would require the reversal of the understanding of the “destination” and “source” would be minor, and make a spare addressing mode using the “direct D register implied as destination” encoding for float registers inside the coding. Given that the loading of FPU registers would also use the direct data register mode, Much better would be 8 single operand functions directly operating on float registers with an immediate direct target. Int to float would be via memory, and software.

So a simple 8 dyad, 8 monad FPU would seem possible, with all literal addressing modes fetching float literals. This would have been an effective use of the $Fxxx instruction space. Very easy to learn and apply. Better assembler support, and hence use incremental. Ideal for an FPGA solution, for ease of implementation given the “back catalog” of software not present on the majority of systems such as Amiga.

  • FADD, FSUB, FMUL, FDIV, FLOAD, FSAVE, FMAQ (a literal [post instruction field] multiply of the EA double and add to destination), FPLY (a literal [post instruction field] added and then multiply the EA double to destination).
  • FINRT (inverse root), FNEG, FLNP1, FEXM1, FATN, FSIN, FBRABS (branch on negative or NaN and make absolute, offset post instruction). These would be on the immediate mode.
  • The immediate mode could be to recall upto 64 constants on the An mode. There is no size field. Or have the 8 float registers on the An mode be floats, and have int to float and float to int on the Dn selections.

More Ideas for Optimization

Also the BCD instructions should be removed. I think there are more logical first/better choice instructions to replace them given all the RISC knowledge, and the fact that they were avoided quite often. Useful in COBOL, but what would it offer in the line of an actual 64 bit multiply or such? There seems a lot of squashed in opcodes which seem to clog the orthogonal nature. In fact losing EXG as well, could be useful in extending arithmetic operations in general on the multiplication side. It also has the advantage of removing a read-modify-write as an instruction primary. Removing BCD opcodes also reduces the fan in of the ALU.

What also surprises me is the fact that 68k manufacturers never use the $Axxx opcode space. Reserved for who? Given the OS incompatibility problem needing a recompile from source on a different system, obviously the system level is the place this space is reserved for. $Axxx speculative use is for another day. Maybe DCTs?

As to the nature of the general core, it would be efficient maybe have multiple dispatch, but a smaller core would be faster. A simple hybrid would be pipelined dispatch with stage skipping, A bigger single cycle cache structure would be more silicon efficient. To make optimal use of the data cache is more about compilation data layout. The instruction cache can be optimized by heavy code auto factorization via BWT pattern matching of a compiled binary. This does however throw some stress on the data cache as threading subroutine addresses get stacked.

The important point is though that there is no alteration of these addresses, and as such they would be better placed in a threading cache. They should be transient, and so have better write back only on spill characteristics. If the D-cache does not “dirty” on a subroutine return address stack, and all “pops” to the PC come from the T or threading cache, there is more efficiency beyond Harvard architecture. The question of where to put other stacked data items is maybe relevant. And killing the modification of a stacked return address is perhaps a good idea in general code. Adding a structured POP return is not needed. The T cache can have a higher density with just one spill address needing adding to system state.

This although adding something, does leave potentially a lot of unused “holes” in the D cache. If the pushes and pulls using SP are further placed in an S cache, a similar spill address, and code would still work without the holes, and dirty bits as long as functions did not stack relative outside the current function’s stack frame. In fact if stack frame indexing is used, the stack relative will always work correctly as a chained nest of indexing. This could be optimized by also using the S cache for link register relative indexing too. Some compatibility issues may arise with ill designed code. Perhaps use of LNK and PEA and some other modals should use an S cache counter. Or maybe this would break too many “bad old code” things, such that the supervisor “legacy” bit (or user “vector” bit) should use the D cache on the link register indirect modes when set, and the S cache switched off.

The next thing to add in the IDTS cache architecture is some mechanism for indirect threading to remove all the JSR instruction opcodes from the subroutine threaded list. Basically applying 1980’s memory techniques to the small cache sizes for fast cache speed. Entering threaded mode is technically the easy challenge, and exiting it at the point of the leaf subroutine code and re-entry on RTS is the most complex. Assuming NOP is one of the most useless instructions in a system is a bad idea, as it synchronizes bus transactions.

But as it does “nothing”, it is possibly an interesting case. The fact that addresses will be what is located in a threading list, a special address (or special addresses like the trap vector list), can be assumed to not form part of the threading, and so allow exit into regular instruction mode, by increasing the “thread counter” from zero, with each JSR/RTN pair inc/dec above zero. When the RTN takes the counter to zero, threading is re-entered. There should be an exception vector for counter overflow, and setting it to zero enters threading mode. Reset should set it to one. It would have to be set via program to zero via a supervisor trap.

Cache Multi-threading

Further optimization of the D cache for other structures which can be paralleled is maybe worth a little more time. A reasonable eviction buffer will fix some issues. The microcode can be split into parts for each pipeline stage, to help reduce its total size. Some pipeline sections might not even need a dual level microcode, or those that do can be 2 pipe staged. Some emphasis on branch prediction has been very effective in modern processor design, with the simplistic backwards taken a staple for many years. This is getting into speculative execution territory, but at this level can be considered speculative decode, up until the first register assignment, with some simplistic stall and flush on miss prediction.

To keep the interrupt latency low is another consideration. This is greatly simplified in a stall flush design, as instruction restart is assured if register commit has no speculation into register renaming. This keeps the processor complexity down, and silicon area down. Having a second “hyper thread” is the most logical way to expand on the core. This does increase area, but shares a lot of logic when cache misses happen. It does however place its own pressure on the caches, and effectively reduces their size per thread. So turning cache area into register decode area, and consequential access delays setting the core size before going multi-core.

An interesting possibility is flipping the hyper thread registers all at once, and having no extra decode. Then splitting the cache into 2, such that the performance of each half is via less decreasing returns, slightly more effective than keeping them unified and being pressured by each other. This requires a 1 cycle switch wait, but also toggles between waits on both when both cache halves are stalled. There is slightly more complexity in write back, but this not much of a relative problem. Multi-threading the OS queue is maybe also complex. or perhaps just as simple as interleaving the timer interrupt to be serviced by either core alternate style. Mutex locking the message passing is the only place this need be done, as the process model of sequential message queues sorts out most contention issues in an already multi-tasking system which has device and file locking.

Things that Might go at $Axxx

So far memory management, and things like DCT codec assistance. The 68k addressing modes support some very complex modes, and even then there is an extra bit set 0 (bit 3) in the full extension word format, which with a bit 8 set to 1, imply even more complex or different modes can be made. Along with the 5 I/IS reserved indications in a regular format full extension. This along with a mode field of 111 and register 101 to 111, give plenty of addressing mode expansion possibilities. The bit field instructions are kind of unnecessary on a large memory system, as it’s better to split bit fields, or use longer instruction sequences. Maybe some are useful, but with bit plane offsets, and modulo in video architecture, there reason for being is largely removed. It’s massive data structures packed then which might need them, for smaller than byte fields. There is also the size operands in some instructions which only have 3 possibilities in 2 bits. Is the extra bit state used? Some instructions already use the can’t target certain modes for extra functionality, such as ORI, suggesting a size long 10, for some extra state information, and a generic 11 size action, mixed up with some strange “opmode and can’t be immediate” of the general OR.

I’d suggest some opcodes are hidden, such as being able to use the other non targetable modes, in “read” for some operand to target an implicit register. It looks so, and would the ruling out of these be consuming unnecessary silicon area to trigger an illegal instruction trap? The PC relative indirect modes for example, although that would just waste literal displacement immediate to point to another literal, and cycle time too. So some more implicit registers could be handled. So I bet the four unused register values in an addressing mode 111 situation would be best as just 4 extra 32 bit registers. Not usable from everywhere, but generally quicker than memory. Not general purpose, and so some compilers would find them difficult to use, but useful in hand optimized inner loops, or for often used task global variables. There is some ideal to go back to plain 68k, with the few minor fixes (if I remember there was some problem with finding status, and supervisor mode that was fixed in later models), and rationalise the add-ons.

So memory needs read protect, write protect, and non execute protect as all the calculated jumps can be indirected and range checked via a library which is accepted. Write protection is perhaps the hardest, as read protection can be done on a per task basis protected secure segment accessed via a supervisor trap and an exception on reading below a memory bound, and using the task ID in a structure allocated below this bound. Generally write protection is not just own task write protection, but global. It also can be loaded up after the addressed data has been loaded into the cache and perhaps overwritten in the cache. The write back is the resolving time of this issue for speed. A simple “bitmapped pixel per page (or even cache line)” strategy would be fast. This uses the minimum of memory. The lack of TLB delays, and having to deal with page faults would make this good. The number of bits per page can be increased to provide protection rings. Or active write zones.

Effective Data

The idea that to extend the addressing scheme the extra I/IS field values in the regular format full extension maybe could do transforms on the data from or to the effective address. This throws an extra ALU into the bus unit. It could be useful, but like all the complex addressing modes it might not get used as much as the designer would hope. PEA and LEA would not use it, Maybe 4 of the codes use the four extra 32 bit registers that can be made? But this would not meld right. They are used in effect to switch on and off sets of things in the addressing mode. 101 to 111 should do a double memory indirect action with null, word and long outer displacements ([[bd,An]],Xn.SIZE*SCALE,od). The two remaining I/IS codes 100 and 100 with differing IS bits should really be part of a group of 3 reassigning 1:000 to ([[bd,An,Xn.SIZE*SCALE]],od) for some flexibility. A BD size field of 00 should also have some effect on the modes, by writing back into An the value An+bd, where the displacement is a word. This write back is even done when the base register is suppressed.

If full extension word format bit 3 is a logic 1, then bits 2 to 0 are an outer displacement register, and bit 6 selects if it is an address or data register. That finalizes the complexity of the addressing modes, apart from maybe utilizing base register suppress for non PC values, for all you crazy people out there. More likely a strange way of getting 8 extra addressing modes. Maybe it would be better to just redesign the extension word format. Such as making bits 7 to 0 have new meaning in an better format full extension. Making bits 5 to 0 be a nested register extension using the result of this effective address indirected as an extra base displacement, and bits 7 and 6 selecting a BD SIZE field, with 00 indicating another following brief extension word, 01 a word, and 10 a long (and maybe 64 bit at 11). The base displacement can be built up by nested brief extension too. Simple. There’s a bit to select a modification of the lower byte.

Just an order for unwinding the brief extension follows counter after processing the full extension register specs. As this ordering does not require a backward search. So a full address mode nesting (indirect displacements), with some simple immediate displacements. The T cache can be used for stacking the nesting, as no JSR will be done in an operand decode.

But no I won’t be building a new instruction table yet. There are other things to code. This is turning into a fill the whole instruction space challenge. For all numerically increasing opcodes upto and including MOVE, I’d have to remove MOVEP, as slow IO can be easily replaced by rarely used longer instruction sequences. I’d also rationalize all the SIZE field 11 CAS group instructions, and maybe move them. MOVES is also suitable for normalization, and having 3 system registers, one to address, one read/write data, and one read status gives the 4 possible bit codes for SIZE in a better MOVES. RTM and CALLM are fine. CMP2/CHK2 are another SIZE field along with CAS/CAS2. In total there are non target modes (6) times SIZE (4) times 2 instructions for prefixes in this range (so 48). If there are 3 auxiliary registers (111 101 to 111 111, as I miscounted 4 earlier), then 4*2 = 8, so 8 prefix words for immediate targets with SIZE.

The 3 auxiliary registers could be something else. Some kind of indirect auto active addressing mode? The 8 prefixes could be assigned to add 8 full width instruction extensions. Packing can be achieved by having them as sticky prefixes via a system status register, but that would need a system trap to exit, or an exit opcode within the added extension which would be preferred. For large data structures 111 101 addressing mode should do a d32 displacement, but using the PC is the only possibility so not that useful. Allowing PC relative writes, as the default. The CALLM should support a d16 for the possible parameter size. Allow BSET etc. on address registers instead of MOVEP, Also 2*32 bit control registers can be targeted similar to CCR.

There would be some rational to define a SIZE 11 meaning, and just eliminate everything which overrode it. OK, so it’s 64 bit. No CMP2, CHK2, MOVEP, CAS, CAS2, and MOVES is modified, d8 become upto d16, and all immediate mode do SR, CCR and a 32 bit and a 64 bit register. This stops the crazy overloading of bit fields. Excellent. This does affect things like MOVE from CCR in other opcodes, and some fiddling about would have to be done, as the immediate target would easily do the MOVE to.

Diagnosis

The SIZE field setting of 11 was not kept free for 64 bit use. Resulting in termination of the ISA expandability. As for addressing modes the terminal issue was not flexibility, the problem was not reassigning the XXX.W mode (difficult to use in most code on a multitasking system), and reassigning it as (d32, Reg). This makes 111 101 be (d32, PC). MOVEQ with bit 8 set, becomes a d24 loader. There are quite a few more. There is a nice solution to the bidirectional CCR and SR issue by using 111 110 and 111 111. This makes for various immediate mode prefixes, for the targeting instructions. In fact the general illegal trap should handle the un-handled ones. Or an emulation 64 bit overlap non supervisor exception would be triggered if the supervisor legacy bit was set. Quite a lot of instructions would have to change but for the A register restrictions on some, and immediate mode target.

That horrid idea to have AND and OR have a memory destination mode and the EOR fart-o-matic restrictions. “Well I couldn’t do long division due to having an OR condition I needed to sort out in the D cache.” Much better to retask the direction bit for DIV at all sizes, and MUL too, yes and all of the unsigned kind, or make the byte and word sizes, do long and quad signed, as this would be more useful. The remainder is not available on the quad size, nor the multiply high word, would make a good compromise.

68k2 is a spreadsheet I put together to analyse the ISA. Putting the old 68k instruction set in a vectored non supervisor soft interrupt clearing the compatibility flag, based on a vector with the following function added (b15.b14.b13.b12.b8.b7.b6.b5.b4.b3)<<2 and an extra indirect jump taken. It makes a 4096 byte table, and perhaps some more soft decode, and a little set of subroutines ended RTS (or a modified enter compatibility version). This does kind of set the coding of RTS in the mess, but this can be vectored. Doing so makes a bootstrap potentially easier.

Making the 64-bit idea apply to all registers, and have a massive GB SD automatic interface memory mapped so the low end is a disk hibernation space of the fast RAM, and memory limit interrupt then a 4GB system becomes natural. Even loading the DMA registers becomes automatable, with the direction of transfer too, even though the transfer size may cause confusion in code. It does mean a RAM disk driver would work, with a slight modification of the sector rw routines, and the initial format process changed some.

Part II

Polyphony and Voicing

After designing the filter sections which are to be used in the audio VST under construction, the next thing to decide is the oscillator voicing. The calculation of many voices does consume a large proportion of the CPU resources in any soft synthesizer design. There are various options such as used in organs like top octave generation and clock division, so that in essence at most 12 voices are played, but using different dynamic spectra.

The octave range at about 11 octaves, or upto the 2048th harmonic, makes this approach have some possible issues with harmonic construction.

LZW (Perhaps with Dictionary Acceleration) Dictionaries in O(m) Memory

Referring to a previous hybrid BWT/LZW compression method I have devised, the dictionary of the LZW can be stored in chain linked fixed size structure arrays one character (the symbol end) back linking to the first character through a chain. This makes efficient symbol indexing based on number, and with the slight addition of two extra pointers, a set of B-trees can be built separated by symbol length to also be loaded in inside parallel arrays for fast incremental finding of the existence of a symbol. A 16 bucket move to front hash table could also be used instead of a B-tree, depending on the trade off between memory of a 2 pointer B-tree, or a 1 pointer MTF collision hash chain.

On the nature of the BWT size, and the efficiency. Using the same LZW dictionary across multiple BWT blocks with the same suffix start character is effective with a minor edge effect, rapidly reducing in percentage as the block size increases. An interleave reordering such that the suffix start character is the primary group by of linearity, assists in the scan for serachability. The fact that a search can be rephrased as a join on various character pairings, the minimal character pair can be scanned up first, and “joined” to the end of the searched for string, and then joined to the beginning in a reverse search, to then pull all the matches sequentially.

Finding the suffixes in the LZW structure is relatively easy to produce symbol codes, to find the associated set of prefixes and infixes is a little more complex. A mostly constant search string can be effectively compiled and searched. A suitable secondary index extension mapping symbol sequences to “atomic” character sequences can be constructed to assist in the transform of characters to symbol dictionary index code tuples. This is a second level table in effect, which can be also compressed for atom specific search optimization without the LZW dictionary loading without find.

The fact the BWT infers an all matches sequential nature, and a second level of BWT with the dictionary index codes as the alphabet could defiantly reduce the needed scan time for finding each LZW symbol index sequence. Perhaps a unified B-tree as well as the length specific B-tree within the LZW dictionary would be useful for greater and less than constraints.

As the index can become a self index, there maybe a need to represent a row number along side the entry. Multi column indexes, or primary index keys would then best be likely represented as pointer tuples, with some minor speed size data duplication in context.

An extends chain pointer and a first of extends is not required, as the next length B-tree will part index all extenders. A root pointer to the extenders and a secondary B-tree on each entry would speed finding all suffix or contained in possibilities. Of course it would be best to place these 3 extra pointers in a parallel structure so not to be data interleaved array of struct, but struct of array, when dynamic compilation of atomics is required.

The find performance will be slower than an uncompressed B-tree, but the compression is useful to save storage space. The fact that the memory is used more effectively when compression is used, can sometimes lead to improved find performance for short matches, with a high volume of matches. An inverted index can use the position index of the LZW symbol containing the preceding to reduce the size of the pointers, and the BWT locality effect can reduce the number of pointers. This is more standard, and combined with the above techniques for sub phases or super phrases should give excellent find performance. For full record recovery, the found LZW symbols only provide decoding in context, and the full BWT block has to be decoded. A special reserved LZW symbol could precede a back pointer to the beginning of the BWT block, and work as a header of the post placed char count table and BWT order count.

So finding a particular LZW symbol in a block, can be iterated over, but the difficulty in speed is when the and condition comes in on the same inverse index. The squared time performance can be reduced? Reducing the number and size of the pointers in some ways help, but it does not reduce the essential scan and match nature of the time squared process. Ordering the matching to the “find” with least number on the count makes the iteration smaller on average, as it will be the least found, and hence least joined. The limiting of the join set to LZW symbols seems like it will bloom many invalid matches to be filtered, and in essence simplistically it does. But the lowering of the domain size allows application of some more techniques.

The first fact is the LZW symbols are in a BWT block subgroup based on the following characters. Not that helpful but does allow a fast filter, and less pointers before a full inverse BWT has to be done. The second fact is that the letter pair frequency effectively replaces the count as the join order priority of the and. It is further based on the BWT block subgroup size and the LZW symbol character counts for calculation of a pre match density of a symbol, this can be effectively estimated via statistics, and does not need a fetch of the actual subgroup size. In collecting multiple “find” items correlations can also be made on the information content of each, and a correlated but rarer “find” may be possible to substitute, or add in. Any common or un correlated “find” items should be ignored. Order by does tend to ruin some optimizations.

A “find” item combination cache should be maintained based on frequency of use and execution time to rebuild result both used in the eviction strategy. This in a real sense is a truncated “and” index. Replacing order by by some other method of such as order float, such that guaranteed order is not preserved, but some semblance of polarity is run. This may also be very useful to reduce sort time, and prevent excessive activity and hence time spent when limit clauses are used. The float itself should perhaps be record linked, with an MTF kind of thing in the inverse index.

VESA NET

VESA NET? An idea for a BIOS extension. A protocol for total removal of the video card from the server. The VESA frame buffer becomes virtual, and routed out only UDP to the default broadcast address. A listener on the network presents MAC address (maybe translated), say 256 screens (8 by 8), so that any maybe routed and zoomed full screen. Along with an SSH ability on the “KVM” box, the default console can be seen of a whole server array. The main purpose being to remove the graphics card from servers. Allowing an SSH via MAC address in the BIOS would seem convenient, but does have massive security issues, being inbound traffic. The printing of a key finger print on the default console assists in the concept of possible login for inbound traffic, and install of the virtual keyboard, and perhaps mouse for more socket removal on the server. Not a full spec, just a concept. The configuration of the VESA receiver to proxy important or even public screens of interest as a web forward, by maybe even including a virtual floppy disk in manor similar to the PROC fs, would also fit in the space available in a modern BIOS flash.

Implementation of Digital Audio filters

An interesting experience. The choice of FIR or IIR is the most primary. As the filtering is modelling classic filters, the shorter coefficient varieties of IIR are the best choice for me. The fact of an infinite impulse response is not of concern with a continuous stream of data, and coefficient rounding is not really an issue when using doubles. IIR also has the advantage of an easy Sallen-Key implementation, due to the subtraction and re-adding of the feedback component, with a very simple CR processing.

The most interesting choices are to do with the anti-alias filtering, as the interpolation filter, on up-sampling is an easy choice. As the ear is not really responsive to phase, all the effort should be on the pass band response levels, and a good stop band non response. A Legendre or Butterworth are the candidates. The concept of a characteristic sound enters the design process at this point, as the cascading of SK filter sections is conceptually useful to improve the -6 dB response at cut off. This is a trade off of 20 kHz to 22.05 kHz in the alias pass band, and greater attenuation in the above 22.05 kHz infinite stop desire. The slight greater desire of alias attenuation above pass band maximal flatness (for audio harmony) implies the Legendre filter is better for the purpose than Butterworth.

In the end, the final choice is one of convenience. and a 9th order filter was decided upon, with 4 times oversampling. The use of 4 times oversampling instead of 8 times oversampling increases the alias by an octave reduction. This fact under the assumption of at least a linear reduction in the amplitude of the frequency of the generator of an alias frequency, with frequency increase, just requires a -12 dB extra gain reduction in the alias filter for an effective equivalence to 8 times oversampling (the up to and the reflection back down to 6 + 6). The amount of GHz processing also halves. These facts then become constructive in the design, with the bulk alias close to the cut off, and the minor reflected alias-alias limit, not being too relevant to overall alias inharmonic distortion.

A triple chain of 3 pole Legendre filter sections is the decided design. The approximate -9 dB at the corner, allows for slightly shifting up the cut off and still maintaining a very effective stop band. Code reuse also aids in the I-cache usage for CPU effective use.  A single 3 pole Legendre is the interpolation up sample filter. The roll off for not using Butterworth does cut some high frequency content from the maximally flat, hence the concept of maximally flat, but it out performs a Bessel filter in this regard. It’s not as though a phasor or flanger needs to operate almost perfectly in the alias band.

Perhaps there is improvement to be made in the up sampling filter, by post up sample 88.2 kHz noise shaped injection to eliminate all error at 44.1 kHz. This may have a potential advantage to map the alias noise into the low frequencies, instead of encroaching from the higher frequencies to the lower, and for creating the alias as a reduction in signal to noise, instead of at certain inharmonic peaks. The main issue with this is the 44.1 kHz wave fundamental, seen as the amplitude ring modulation of the injected phase noise, by the 44.1 kHz stepped waveform between samples input. The 88.2 kHz “carrier” and the sidebands are higher in frequency, and of the same amplitude magnitude.

But as this is following for no 44.1 kHz error, the 88.2 kHz and sidebands are the induced noise, the magnitude of which is of the order of 1 octave up from the -3 dB roll at the corner, plus approximately the octave for a 3 pole filter, or about 36 dB cut of a signal 3/4 of the input amplitude. I’d estimate about -37 dB at 88.1 kHz, and -19 dB at 44.1 kHz. Post processing with a 9 pole filter, provides an extra -54 dB on down sampling, for an estimate of around -73 dB or greater on the noise. That would be about 12 bit resolution at 44.1 kHz increasing with frequency. All estimates, likely errors, but in general not a good idea from first principals. Given that the 44.1 kHz content would be very small though post the interpolation filter, -73 dB down from this would be good, although I don’t think achievable in a sensible manor.

Using the last filtered sample in as the reference for the present sample filtered in as a base line, the signal at 22.05 kHz would be smoothed. It would have a notch filter effect, by injecting quantization offset ringing noise at 88.2 kHz to cancel 22.05 kHz. The notch would likely extend down in frequency for maybe -6 dB at about 11 kHz. Perhaps in the end it is just better to subtract the multiplied difference between two up sample filters using different sinc spreading of a 1000 and a 1100 sample occupancy zero inter fill. Subtracting the alternates up conversion delta as it were.

There is potentially also an argument for having a second order section with damping factor near 0.68 and corner 22.05 kHz to achieve some normalisation from sinc up-sampling. This adds in an amount of Q such as to peak the filter cancelling the sinc droop, which would be about 3% at 4 times oversampling.

EDIT: Some of you may have noticed that the required frequencies for stable filtering are too high at 4 times oversampling. So unfortunate for the CPU load an 8 times oversample has to be used. The sinc error is less than 1% at this oversample, but still corrected in a similar way, and a benefit of 2 extra poles. Following this by a 0.1 dB 3 pole Chebyshev high pass which has been inverted, gives a reasonable 5 pole up sampling filter. The down sampling filter for code efficiency is a triple instance of the sample inverse Chebyshev, with the corner frequencies slightly offset to produce more individual zeros, and some spreading of the “ringing”. These 9 poles are enough to get the stop band ripple to be lower than a 16 bit resolution. Odd order inverse Chebyshev are essential for the reflected spectra to be continually decreasing in amplitude.

How Moving to Outlook Destroys Your Gmail

Just testing out office 365. Surprising features in Outlook include unsubscribing folders, still makes them delete-able. Which is very surprising as you’d likely unsubscribe to a folder to not consume local space and download, and just want it left alone on the server. But alas the default to spam fill everything, with no subscribe wizard, and a bad default action on unsubscribed folders, and you’d think Microsoft never ever tested this thing in the real world of reality.

I have been thinking of a short cost saving by an Azure migration, but this has to be tested before committal. The last thing I need is to find that Microsoft has some limits on root servers. I find my current provider good, and would not want to end up between a rock and a hard place.

JDeveloper and Intel Python

The JDeveloper environment looks good. Nice work Oracle, and some of the Borland classic JBuilder. This tool look more like how I’d use an IDE. I’ve been looking at other technologies for computer development, and a recent Intel offering (for personal use free) is the MKL backed Intel Python. It needs at least an SSE4.2 supporting chip, but does have all that is needed to run the development on Xeon Phi Knights Landing. 72 cores and 144 vector processing AVX-512 engines. Multi Tflops stuff. For the developer this is perhaps the easiest way to start HPC, as through Cython and eventually C, the best performance can be had. Maybe FPGAs will help, and tools are available for that too. I’ve seen some good demonstrations, and maybe some clients with complex or hard problems would need this.

All this parallel stuff got me thinking of Kahan sums, and simulation of incompressibles by having a high speed of sound in a compressible, and the doing a compulsory diffusion to damp oscillation, and a pressure impulse (Pa s) handling of inertial failure of containers. It might reduce the non-locality of certain simulations, and actually act to simulate pressure hammer effects.

I’ve also recently got back into the idea of using Free Pascal for some of my projects maybe. There is now good JNI support, and even JVM targeting. I maen it’s very possible to use C for this kind of thing, but the FPC IDE and Lazarus are quick to build, with incremental unit compilation and many other features which make it good competition for general coding. Some would think it old hat, but the ease of use is excellent with much type checking, and no insistence on everything being a class. Units are very modular that way. The support for quite a few Pascal flavours is also good.

Power Systems

genLot of free energy videos about but does it actually work, or is it just virtual vapour wares? Here’s a highly unstable circuit I designed a few years ago. The magnetic balance is so fine, that an external field can throw the circuit into an unstable power spike. Then I went for an inductance modulation of lower scale, using a 3 phase (+++), to 1 phase (++-) arrangement, for greater stability. The difficulty with such devices is not the working, but the switch off without raising volts potential to any unlucky hands. This safety aspect is the ultimate reason of non use, and not as some suppose the disruptive effect on oil and other nuclear markets. Those markets may shrink, but will always be. The chemical industry will always have need of basic oil produce, and the lower short term profits of non burn, actually extend the future profits of chemical building. Transport is minor compared to health. The nuke industry could easily shrink, and still be big. The power waste of removing rods with 90% still effective power is a white wash of the electric power from a military objective. Reactors would be different for pure civil use.

downloadAmazing colours, but what’s it really about? The Pu problem of fast breed, and somehow there will never be less of it, just does not add up for efficient too cheap to meter power promises of not too long ago. There seems to be no real research on gamma cavity down conversion technology. I wonder how long it will be before the nova bomb. The effective slowing of light to lower than the black hole threshold, at Sun core. I think the major challenge is getting super dielectrics far enough into the Sun without melt. I suppose this is some hyperbole focus problem. One day people will understand the simple application of button technology, and the boxes will judge and provide on intent or not. It’s not like they won’t have a self interest.