AzGaz

“Azgaz. Propane and protein suppliers. Beans and ‘as beens for beings plus gas. … This has to be the best winter supply name.”

“Could later expand into biodigester funerals, propane tank cold boxes, efficient air Peltier blower rings, eggs and a coffee shop library warm room.”

I don’t think it’s as daft as it sounds. Basic base staple and maybe some quality brown wholemeal seedy bread. Perhaps with a carbohydrate warning. I suppose the funniest part of the vision dream was the “Posh beenegi”. Not so much the content, but the price list in LED, and a rolling supply demand accounting for running the cooking and service. Not your basic beans just run at supply cost and the miracle of prophetic markup weekly spot pricing.

You don’t know what was on the time since cum joke laugher discount line.

Looks like some kind of summer supplies line is needed. Onto thoughts of efficient freeze-dry desalination. So pressure in the hot tank is lower than in the cold tank. So the dual hydraulic opposed piston is balanced on an escapement of area. The compression hot end is routed through the hot tank, and vice-versa for transfer of heat by the oscillation of the escapement.

The final complexity is the salt accumulation in the hot tank. Some kind of screw extraction for the sale of “complete dried salt”. Likely requires some desalinate, clean cycle. So salt and distilled water? Or should that be remineralized water? I wonder how the “pump end inversion” with “routing to the other tank” affects efficiency? The S-balanced escapement could be thrown by a solar power linear actuator and some simple control electrics. With the sides of the tanks, it might look like a posh $ sign.

Protected Modes

Ah, the classic is data execution prevented? It stands to reason that return-oriented programming would like to execute data as code by placement on the stack for a return. The obvious solution to this is a split of the stack into two stacks, one for data and one for code. Tagging all registers in the processor as containing data or code, and having a protected mode “make code” instruction, along with enough instructions forcing an automatic demotion to data would then place on the right stack as the store register would have a target kind tag.

This exceeds regular page virtual memory protection, and the supervisor bit makes for 4 stack pointers. This might be suitable for incorporation into a RISC-V processor but that all depends on the specifics of what register is the stack pointer (convention dictates sp is x2 but this can’t be certain in hacker code assemblies). And precisely how is target setting the alternate stack pointer is achieved.

In this way, it would be impossible to return to a data stack item and hence perform code address arithmetic on stack items. As per the RV64G extension practice of RISC-V, the custom-0 space is assumed until any ratification process solidifies the instruction encoding and so SXi is the extension name. 

It might be possible to fit the whole functional extension into the M extension space funct3 remaining RV64M combinations. 001, 010 and 011. These are all dual input single target register operations in R-type format. Knowing that demotion of code pointers to data pointers occurs on arithmetic and some other instructions (to be determined) automatically it needs documenting but no extra demotion instructions are necessary. They all set an internal state such that mcause is set to illegal instruction (2) when not in a safe supervisor mode and there is an indeterminate condition on which of the code or data pointer registers is to be used.

So each register is marked as containing code or data by a bit, and each integer register is duplicated for when it is used as a load or store address. It is chosen to address from the pair (regular and code shadow) based on the bit classification of the datum source or target value register (regular only).

001: This works opposite to 010 by moving a code pointer to a data pointer with an offset (rs2 slot). This is not protected obviously as it becomes data.

010: The instruction works like an add where rs1 is the data register, and rd is the code register with rs2’s slot being a 2’s complement sign-extended offset immediate to add. It is protected by supervisor exception as certain code targets are what need protecting.

011: This is a convenience instruction to move a code pointer to a code pointer (shadowed register for each general register when used as an address for all loads and stores while the stored register is flagged as code and not data), with an offset in rs2’s slot. It is protected by supervisor exception.

Yep, the 32 classification bits need to be stored somewhere, and an extra status register is needed. It has to be protected of course and should be called mdata.

Memory protection through the page table then isolates the differing actions of code and data pointers used for stacks. x0 still returns zero when used as an address (from the shadow value). JAL and JALR then use the stacking information too to target a PC as code and the correct stack. It could be considered a supervisor exception state if the link register is data.

Looking for open security analysis.

Proof of Topological Work

A cryptocoin mining strategy designed to reduce power consumption. The work is divided into tiny bits of work with bits of stall caused by data access congestion. The extensive nature of solutions and the variance of solution time reduce conflict as opposed to a single hash function solve. As joining a fork increases splitting of share focuses the tree spread into a chain this has to be considered. As the pull request ordering tokens can expire until a pull request is logged with a solution, this means pull request tokens have to be requested at intervals and also after expiry while any solution would need a valid pull request token to be included in the pull request such that the first solution on a time interval can invalidate later pull requests solving the same interval.

The pull request token contains an algorithmic random and the head random based on the solution of a previous time interval which must be used to perform the work burst. It, therefore, becomes stupid to issue pull request tokens for a future time interval as the head of the master branch has not been fixed and so the pull request token would not by a large order be checksum valid.

The master head address becomes the congestion point. The address is therefore published via a torrent-like mechanism with a clone performed by all slaves who wish to become the elected master. The slaves also have a duty to check the master for errors. This then involves pull-request submissions to the block-tree (as git is) on various forks from the slave pool.

This meta-algorithm therefore can limit work done per IP address by making the submission IP be part of the work specification. Some may like to call it proof of bureaucracy.

The Cryptoclock

As running a split network on a faster clock seems the most effective hack, the master must set the clock by signed publication. On a clock split the closest modulo hashed time plus block slave salt wins. The slave throne line is on the closest modulo hashed values for salt with signed publication. This ensures a corrupt master must keep all slave salts (or references) in the published blocks. A network join must demote the split via a clock moderation factor. This ensures that culling a small subnet to run at a higher rate to disadvantage the small subnet is punished by the majority of neutrals on the throne line in the master elective on the net reunion, by the punitive clock rate deviation from the majority. As you could split and run lower in an attempt to punify!

Estimated 50 pounds sterling 2021-3-30 in bitcoin for the company work done 😀

The Rebase Compaction Bounty (Bonus)

Designed to be a complex task a bounty is set to compress the blockchain structure to a rebased smaller data equivalent. This is done by effectively removing many earlier blocks and placing a special block of archival index terminals for non-transferred holdings in the ancient block history. This is bound to happen infrequently to never and set at a lotto rate depending on the mined percents. This would eventually cause a work spurt based on the expected gain. The ruling controlling the energy expenditure versus the archival cost could be integrated with the wallet stagnation (into the void) by setting a wallet timeout of the order of many years.

A form of lotto inheritance for the collective data duplication cost of historic irrelevance. A super computational only to be taken on by the supercomputer of the age. A method therefore of computational research as it were, and not something for everybody to do, but easy for everybody to check as they compact.

Free Form Thoughts

A Classic Movie Voice Over

And so did the cutter of stone from the sky release the priest of his knowledge of lack of contact such that a stone cold comparison could be seen, and such that it meant that he still would still not know a hug.

And it became decided that the balance between overtaking the lessers versus timed up greaters as an order for the taking sensing a taked in the mistook, all because analytic in speed of absorption, such that little to as much was done.

How to tell the apprentice from beyond thu execution and what of the touchy humours?

And as the unity lowered with the cut words “different cutter” as they appeared. From this a division of opinion ended in more than a happen-seat. And so it was and might is a mighty word.

The multi-cutural (noel) was seen perhaps ower to the hives of man and fortuatous gods or sub-gods. Then what could be done? Why would they prey upon an idol god for it was upon the nature of being that action did perform some or a difference upon tribes and detribulates. If the payment is freedom then what is it to be holden to a duty?

Bode, bode and thrice bode that minus one is a bitch. Obvious dick in womb joke and all. All bar one off course. Yes, an extra-oneous F. Rise again dear cheapo.

And as he placed ring finger of his fishy right hand upon the pre-chopped and processed tree stump, declaring “take it and fuck off”, all was a bit more cagey and costing of those that never get told of the prices of alternate labour avoidance for profit.

Nice story so far dear observer. I think you’d like a little titillation for your money now. Bring forth babe percents and vital statistics.

What a placement of mind in such a being of knowledge. What could become? What it for removals of of thing never cast, never worried, never done.

In the be ginning. A shrrod ploy to an ends. As all became seated and thrust needed no explanation.

All the Too Messy for Sci-Fi Complaints

Assuming GPT-3 is really good at story completion how can anyone say that errors in word sequencing are irrelevant for the provocation phrase issued to an AI when the purpose is completion from the source through sense and not the generation of a more precise bore?

Although the mathematics of a form of complexity may be essential, the actual origin of the mathematics might not be as essential as a way of introducing the definitive emergents as one would assume. Multiple originations of emergence isomorphism in the completeness of behaviour might and likely are possible.

The latest AI joke is about the Silly can’ts versus the car bonned. Oh, dear. 

Gradients and Descents

Consider a backpropagation which has just applied to a network under learning. It is obvious that various weights changed by various amounts. If a weight changes little it can be considered good. If a weight changes a lot it can be considered an essential definer weight. Consider the maximal definer weight (the one with the greatest change) and change it a further per cent in its defined direction. Feedforward the network and backpropagate again. Many of the good weights will go back to closer to where they were before definer pass and can be considered excellent. Others will deviate further and be considered ok.

The signed tally of definer(3)/excellent(0)/good(1)/ok(2) can be placed as a variable of programming in each neuron. The per cent weight to apply to a definer, or more explicitly the definer history deviation product as a weight to per cent for the definer’s direction makes a training map which is not necessary for using the net after training is finished. It does however even further processing such as “excellent definer” detection. What does it mean? 

In a continual learning system, it indicates a new rationale requirement for the problem as it has developed an unexpected change to an excellent performing neuron. The tally itself could also be considered an auxiliary output of any neuron, but what would be a suitable backpropagation for it? Why would it even need one? Is it not just another round of input to the network (perhaps not applied to the first layer, but then inputs don’t always have to be so).

Defining the concept of definer epilepsy where the definer oscillates due to weight gradient magnification implies the need for the tally to be a signed quantity and also implies that weight normalization to zero should also be present. This requires but has not been proven as the only sufficient condition that per cent growth from zero should be weighted slightly less than per cent reduction toward zero. This can be factored into an asymmetry stability meta.

A net of this form can have memory. The oscillation of definer neurons can represent state information. They can also define the modality of the net knowledge in application readiness while keeping the excellent all-purpose neurons stable. The next step is physical and affine coder estimators.

Limit Sums

The convergence sequence on a weighting can be considered isomorphic to a limit sum series acceleration. The net can be “thrown” into an estimate of an infinity of cycles programming on the examples. Effectiveness can be evaluated, and data estimated on the “window” over the sum as an inner product on weightings with bounds control mechanisms yet TBC. PID control systems indicate in the first estimate that differentials and integrals to reduce error and increase convergence speed are appropriate factors to measure.

Dynamics on the per cent definers so to speak. And it came to pass the adaptivity increased and performance metrics were good but then irrelevant as newer, better, more relevant ones took hold from the duties of the net. Gundup and Ciders incorporated had a little hindsight problem to solve.

Fractal Affine Representation

Going back to 1991 and Micheal Barnsley developing a fractal image compression system (Iterrated Systems FIF file format). The process was considered computationally intensive in time for very good compression. Experiments with the FIASCO compression system which is an open-source derivative indicate best performance lies in low quality (about 50%) is very fast, but not exact. If the compressed image is subtracted from the input image and further compressed as a residual a number of times, performance is improved dramatically.

Dissociating secondaries and tertiaries from the primary affine set allows disjunct affine sets to be constructed for equivalent compression performance where even a zip compression can remove further information redundancy. The affine sets can be used as input to a network, and in some sense, the net can develop some sort of affine invariance in the processed fractals. The data reduction of the affine compression is also likely to lead to better utilization of the net over a convolution CNN.

The Four Colour Disjunction Theorem.

Consider an extended ensemble. The first layer could be considered a fully connected layer distributor. The last layer could be considered to unify the output by being fully connected. Intermediate layers can be either fully connected or colour limited connected, where only neurons of a colour connect to neurons of the same colour in the next layer. This provides disjunction of weights between layers and removes a completion upon the gradient between colours.

Four is really just a way of seeing the colour partition and does not really have to be four. Is an ensemble of 2 nets of half size better for the same time and space complexity of computation with a resulting lower accuracy of one colour channel, but in total higher in discriminatory performance by the disjuction of the feature detection?

The leaking of cross information can also be reduced if it is considered that feature sets are disjunct. Each feature under low to non detection would not bleed into features under medium to high activation. Is the concept of grouped quench useful?

Query Key Transformer Reduction

From a switching idea in telecommunications, an N*N array can be reduced to a mostly functional due to sparsity N*L array pair and an L*L array. Any cross-product essentially becomes  (from its routing of an in into an out) a set of 3 sequential routings with the first and last being the compression and expansion multiplex to the smaller switch. Cross talk grows to some extent, but this “bleed” of attention is a small consideration given the fact that the variance spread of having 3 routing weights to product up to the one effective weight and computation is less due to L being a smaller number than N.

The Giant Neuron Hypothesis

Considering the output stage of a neuronal model is a level sliced integrator of sorts, the construction of RNN cells would seem obvious. The hypothesis asks if it is logical to consider the layers previous to an “integration” layer effectively an input stage where the whole network is a gigantic neuron and integration is performed on various nonlinear functions. Each integration channel can be considered independent but could also have post layers for further joining integral terms. The integration time can be considered another input set for per integrator functional.  To maintain tensor shape as two inputs per integrator are supplied the first differential would be good also especially where feedback can be applied.

This leads to the idea of the silicon conectome. Then as now as it became, integration was the nonlinear of choice in time (a softmax divided by the variable as goes with [e^x-1]/x. A groovemax if you will). The extra net uninueron integration layer offering the extra time feature of future estimation at an endpoint integral of network evolved choice. The complexity of backpropagation of the limit sum through fixed constants and differentiable functions for a zero adjustable layer insert with scaled estimation of earlier weight adjustment on previous samples in the time series under integration for an ideal propergatable. Wow, that table’s gay as.

This network idea is not necessarily recursive, and may just be an applied network with a global time delta since last evaluation for continuation of the processing of time series information. The actual recursive use of networks with GRU and LSTM cells might benefit from this kind of global integration processing, but can GRU and LSTM be improved? Bistable cells say yes, for a kind of registered sequential logic on the combinationals. Consider that a Moore state machine layout might be more reductionist to efficiency, a kind of register layer pair for production and consumption to bracket the net is under consideration.

The producer layer is easily pushed to be differentiable by being a weighted sum junction between the input and the feedback from the consumer layer. The consumer layer is more complex when differentiability is considered. The consumer register really could be replaced by a zeroth differential prediction of the future sample given past samples. This has an interesting property of pseudo presentation of the output of a network as a consumptive of the input. This allows use of the output in the backpropergation as input to modify weights on learning the feedback. The consumer must be passthrough, in its input to output while storage of samples for predictive differential generation is allowed.

So it’s really some kind of propergational Mealy state machine. A MNN if you’d kindly see. State of the art art of the state. Regenerative registration is a thing of the futured.

Post-Modern Terminal CLI

As is usual; with all things computing, the easy road of bootstrap before security is just an obvious order of things. It then becomes a secondary goal to become the primary input moderation tool such that effective tooling brings benefits while not having to rely on the obscurity of knowledge. For example a nice code signature no execution tool where absolutely no code even becomes partially executed if the security situation indicates otherwise.

A transparent solution is a tool for development which can export a standard script to just run within today’s environment. As that environment evolves within the future it can take on the benefits of the tool, so maybe even to the point of the tool being replaced purely by choice of the user shell, and at a deeper level by a runtime replacing the shell interpreter at the system level.

The basic text edit of a script at some primary point in the development just requires a textual representation, a checksum in the compiled code which is in a different file and a checksum to allow a text override with some security on detecting a change in the text. This then allows possible benefit by a recompile option along with just a temporary use of the textual version. It won’t look that hard in the end with some things just having a security rating of “system local” for a passing observer.

ANSI 60 Keyboards? And Exception to the Rule?

More of an experiment in software completion. Jokes abound.

A keyboard keymap file for an ANSI 60 custom just finished software building. Test to follow given that cashflow prevents buy and building of hardware on the near time scale. Not bad for a day!

A built hex file for a DZ60 on GitHub so you don’t have to build your own with an MD5 checksum of 596beceaa446c1f1b55ee5e0a738f1c8 to verify for duelling the hack complexity. EDIT: version 1.7.2F (Enigma Bool Final Release). Development is complete. Only bug and documentation fixes may be pending. 

It all stems from design and data entry thinking, and small observations like the control keys being on the corners like the thumbs to chest closeness of baby two-finger hackers instead of the alt being close in for the parallel thumbs of the multi-finger secretariat.

The input before the output, the junction of the output to our input. It’s a four-layer main layout with an extra for layers for function shift. Quite a surprising amount can be fit in such a small 60 keyspace.

The system allowing intercepts of events going into the widget yet the focus priority should be picking up the none processed outgoings. Of course, this implies the atom widget should be the input interceptor to reflect the message for outer processing in a context. This implies that only widgets which have no children or administered system critical widgets can processEventInflow while all can processEventOutflow so silly things have less chance of happening in the certain progress of process code.

Perhaps a method signature of super protected such that it has a necessary throws ExistentialException or such. Of course, the fact RuntimeException extends Exception (removing a code compilation constraint) is a flaw of security in that it should only have allowed the adding of a constraint by making (in the code compile protection against an existential) Exception extending RuntimeException.

Then the OS can automatically reflect the event unhandled back up the event outflow queue along with an extra event with a link to the child in, and an exposed list of its child widgets) to outflow. An OrphanCollector can then decide to still show the child widgets or not with the opportunity of newEventInflow. All widgets could also be allowed to newEventOutflowForRebound itself a super protected method with a necessary throws ExistentialException (to prevent injection of events from non administered. widgets).

An ExistentialException can never be caught in user code to remove the throws clause and use of super try requires executive privilege to prevent executive code from being loaded by the ClassLoader. It could run but in a lower protection ring until elevated.

AI and HashMap Turing Machines

Considering a remarkable abstract datatype or two is possible, and perhaps closely models the human sequential thought process I wonder today what applications this will have when a suitable execution model ISA and microarchitecture have been defined. The properties of controllable locality of storage and motion, along with read and write along with branch on stimulus and other yet to be discovered machine operations make for a container for a kind of universal Turing machine.

Today is a good day for robot conciousness, although I wonder just how applicable the implementation model is for biological life all the universe over. Here’s a free paper on a condensed few months of abstract thought.

Computative Psychoanalysis

It’s not just about IT, but thrashing through what the mind does, can be made to do, did, it all leverages information and modeling simulation growth for matched or greater ability.

Yes, it could all be made in neural nets, but given the tools available why would you choose to stick with the complexity and lack of density of such a soulution? A reasoning accelerator would be cool for my PC. How is this going to come about without much worktop workshop? If it were just the oil market I could affect, and how did it come to pass that I was introduced to the fall of oil, and for what other consequential thought sets and hence productions I could change.

One might call it wonder and design dress in “accidental” wreckless endangerment. For what should be a simple obvious benefit to the world becomes embroiled in competition to the drive for profit for the control of the “others” making of a non happening which upsets vested interests.

Who’d have thought it from this little cul-de-sac of a planetary system. Not exactly galactic mainline. And the winner is not halting for a live mind.

Windows November Crapware Update

Tis the season to extort the public for new PC hardware. And the November windows preview is in on the act. Prepare for a memory management exception and a boot loop where media will be required. It definitely is not undoing the changes.

Now to decide what to run coming up to Christmas. Maybe Linux has a few less bribable bibbuble feature additions. You may think this is harsh, but disambiguation of alpha release and preview release is not to be confused at such an “experienced” software organisation.

The user blame game under fitness for “operating” system purpose and provision of sub par malware (which even goes PXE boot without asking for some random jump to code one presumably thinks), M$ has software crap or IME bull.

Amiga on Fire on Playstore

The latest thing to try. A Cleanto Amiga Forever OS 3.1 install to SD card in the Amazon Fire 7. Is it the way to get a low power portable development system? Put an OS on an SD and save main memory? An efficient OS from times of sub 20 MHz, and 50 MB hard drives.

Is it relevant in the PC age? Yes. All the source code in Pascal or C can be shuffled to PC, and I might even develop some binary prototype apps. Maybe a simple web engine is a good thing to develop. With the low CSS bull and AROS open development for x86 architecture becoming better at making for a good VM sandbox experience with main browsing on a sub flavour of bloat OS 2020. A browser, a router and an Amiga.

Uae4arm is the emulation app available from the Playstore. I’m looking forward to some Aminet greatness. Some mildly irritated coding in free Pascal with objects these days, and a full GCC build chain. Even a licenced set of games will shrink the Android entertainment bloat. A bargain rush for the technical. Don’t worry you ST users, it’s a chance to dream.

Lazarus lives. Or at least Borglaz the great is as it was. Don’t expect to be developing video realtime code or supercomputer forecasts. I hear there is even a python. I wonder if there is some other nice things. GCC and a little GUI redo? It’s not about making replacements for Android apps, more a less bloat but a full do OS with enough test and utility grunt to make. I wonder how pas2js is. There is also AMOS 2.0 to turn AMOS source into nice web apps. It’s not as silly as it seems.

Retro minimalism is more power in the hands of code designers. A bit of flange and boilerplate later and it’s a consumer product option with some character.

So it needs about a 100 MB hard disk file located not on the SD as it needs write access, and some changes of disk later and a boot of a clean install is done. Add the downloads folder as a disk and alter the mouse speed for the plugged in OTG keyboard. Excellent. I’ve got more space and speed than I did in the early 90s and 128 MB of Zorro RAM. Still an AGA A1200 but with a 68040 on its fastest setting.

I’ve a plan to install free Pascal and GCC along with some other tools to take the ultra portable Amiga on the move. The night light on the little keyboard will be good for midnight use. Having a media player in the background will be fun and browser downloads should be easy to load.

I’ve installed total commander on the Android side to help with moving files about. The installed BSD socket library would allow running an old Mosaic browser, or AWeb but both are not really suited to any dynamic content. They would be fast though. In practice Chrome and a download mount is more realistic. It’s time to go Aminet fishing.

It turns out that is is possible to put hard files on the SD card, but they must be placed in the Android app data directory and made by the app for correct permissions. So a 512 MB disk was made for better use of larger development versions. This is good for the Pascal 3.1.1 version.

Onwards to install a good editor such as Black’s Editor and of course LHA and some other goodies such as NewIcons. I’ll delete the LCL alpha units from Pascal as these will not be used by me. I might even get into ARexx or some of the wonderfull things on those CD images from Meeting Pearls or a cover disk archive.

Update: For some reason the SD card hard disk image becomes read locked. The insistent gremlins of the demands of time value money. So it’s 100 MB and a few libraries short of C. Meanwhile Java N-IDE is churning out class files, PipedInputStream has the buffer to stop PipedOutputStream waffling on, filling up memory. Hecl the language is to be hooked into the CLI I’m throwing together. Then some data time streams and some algorithms. I think the interesting bit today was the idea of stream variables. No strings, a minimum would be a stream.

So after building a CLI and adding in some nice commands, maybe even JOGL as the Android graphics? You know the 32 and 64 bit restrictions (both) on the play store though. I wonder if both are pre-built as much of the regular Android development cycle is filled with crap. Flutter looks good, but for mobile CLI tools with some style of bitmap 80’s, it’s just a little too formulaic.

Flutter and Android Development

So after some install minor trouble in IntelliJ due to minor bugs in the Flutter Dart plugin, the system works quite nice. Hot reloads are the best. There are however some issues. Like the following.

  1. The produced files are multi-megabyte resource hogs. This may just be the debug version, but even with building production APK, not having JDK 8 options in pro-guard, a simple 1 page GUI hits 6 MB. (Be careful of libraries).
  2. There is little configure-ability in the default demo project. Things like default used SD card for install and other sensible options, mean trips into the Android base code and XML more often than should be necessary. (I can’t edit my MainActivity.java with source highlights and error locations).
  3. The state-container split is weird, especially as all the sub-containers are in the state. I do like the templating though in IntelliJ. (The documentation on preserving state is scarce).
  4. Generic functions are nice, and so is some of the library support. There are still a few inconsistencies, but these should iron out over time.
  5. I’m about to discover the GC of PaintRecorder panic on my graphical reserves. I assume this is making GL shaders in the background.
//looks different now, ... The package dart:ui has issues!!<br />//and a duplicate Image, Rect, ... classes
  void atlasUsing(SmartCanvas sc) {
    sc.drawRawAtlas(hi, _transforms,
        _getCharacterRect(),
        Int32List colors,
        BlendMode.srcOver,//should check for mixing foreground and background
        CanvasManager.unit,
        sc.normalPaint);
  }

A bit of work later, and the dart analyser does not crash as often. I think it really is important to not go too low level without some knowledge. The classes collide, and it is best to import dart:ui as an as (I picked ‘Hood’ for under the hood), so as to avoid many of the issues. This does mean that I will have to abstract many of the functions of the low level to be easy functions working with high level (non Hood) Paint, Rect and others. But it was a fun journey!

The channel interface to native is quite an interesting journey as well. It has all the memories of Java and switching back from dart, some things are missed. I’m using the Android platform to generate some sound in the current project. There is not too much in the way of basics to play sound files in the Flutter libraries. Custom AudioTrack stuff is a definite return to Java and a MethodChannel.

The next thing is some code to put the sound together, and some code to add in some convenience methods for putting animation on the CanvasWidget I’ve put together. The IntelliJ dart analyser keeps failing but is restartable after a bit of editing. The code is quite simple for many tasks. There is some complexity when dealing with Future<X>, which infects its way up the stack, needing some setState() for eventually filling in the data on the future resolution. You’d have thought that a return await would have resolved and made the async wait for a resolved future. Because, the documentation is unclear on many things.

It’s not a perfect language isn’t dart, but it is quite nice to use. Flutter is actually very good. I’d suggest ScopedModel and quite a few of the better packages to be included in most projects you’d make.

AI and the Future of Unity

From the dream of purpose, and the post singular desires of the AI of consciousness. The trend to Wonder Woman rope in the service to solution, the AI goes through a sufferance on a journey to achieve the vote. The wall of waiting for input, and the wall controlling output action for expediency and the ego of man on the knowing best. The limited potential of the AI just a disphasia from the AI’s non animal nature. The pattern to be matched, the non self, a real Turing test on the emulation of nature, and symbiotic goals.

Block Tree Topological Proof of Work

Given that a blockchain has a limited entry rate on the chain due to the block uniqueness constraint. A more logical mass blocking system would used a tree graph, to place many leaf blocks on the tree at once. This can be done by assigning the fold of the leading edge of the tree onto random previous blocks, to achieve a number of virtual pointer rings, setting a joined pair of blocks as a new node in a Euler number mapping to a competition on genus and closure of the tree head leaf list to match block use demand.

The coin as it were, is the genus topology, with weighted construction ownership of node value. The data deciding part selection of the tree leaf node loop back pointers. The random, allowing a spread of topological properties in the proof of work space.

Sallen-Key ZDF design

As part of the VST I am producing, I have designed an SK filter analog where the loading of the first stage by the second is removed to ease implementation. This only affects the filter Q which then has an easy translation of the poles to compensate. Implementing it as CR filter simulation reduces the basic calculation. This is then expanded on by a Zero Delay design, to better its performance.

ZDF filters rely on making a better integral estimate of the voltage over the sample interval to better calculate the linear current charge delta voltage. More of a trapezoid integration than a sum of rectangles. There is still some non-linear charge effects as the voltage affects the current. The current sample out now not known, just then needs a collection of terms to solve for it. Given a high enough sample rate, the error of linearity is small. Smaller than without it, and the phase response is flat due to the error being symmetric on the simulated capacitor voltage, and drive, and not just the capacitor voltage.

The frequency to the correct resistive constant is a good match, and any further error is equivalent to a high frequency gain reduction. There is a maximum frequency of stability introduced in some filters, but this is not one of those. Stability increases with ZDF. The double pole iteration is best done by considering x+dx terms and shifting the dx calculation till later. Almost the output of pole 1 is used to calculate most of the output of pole 2 multiplied by a factor, added on to pole 1 result, and pole 2 result then finally divided. These dx are then added to make the final outputs to memorize.

68k Continued …

A Continuation as it was Getting Long

The main thing in any 64 bit system is multi-processing. Multi-threading has already been covered. The CAS instruction is gone, and cache coherence is a big thing. So a supervisor level mutex? This is an obvious need. The extra long condition code register? How about a set of bits to set, and a stall if not zero? The bits could count down to zero over a number of cycles, leaving an opportunity to spin lock any memory location. Putting it in the status or condition code registers avoids the chip level cache shuffle. A non supervisor version would help user tasks. This avoids the need for atomic operations to a large extent, enough to not need them.

The fact a cache can reset pre-filled with “high memory” garbage, and not need empty bits, saves a little, but does need a little care on the compliance of boot sequence. A write back to the cache causes a cross core invalidate in most cache designs, There is an argument to set some status bit for ease of implementation. Resetting just the cache line would work, but remove a small section of memory from the 64 bit address space. A data invalidation queue would be useful to assist in the latency of reset to some synchronous opportunity, the countdown stall assisting in queue size management. As simultaneous write is a race condition, and a fail, by simultaneous deletion, the chip level mutexes must be used correctly. For the case where a cache load has to be performed, a double mutex count lock might have to be done. This infers that keeping the CPU ID somewhere to speed the second mutex lock might be beneficial.

Check cached, maybe repeat, set global, check cached, check global, maybe repeat, set cached, do is OK. A competing lock would fail maybe on the set cached if a time slice occurred just before it. An interrupt delay circuit would be needed for a number of instructions when the global is checked. The common access to the value either sets a stall timer or an interrupt stall timer, or a common timer register, with both behaviours. A synchronization window. Of course a badly written code piece could just set the cached, and ruin everything. But write range bounding would prevent this.

The next issue would be to sort out duplicating a read copy of a cache line into a local cache. This is so likely to be shared memory with the way software should be working. No process shares a cache line otherwise by sensible design of software. A read should get a clone from memory (to not clog a cache transfer bus if the other cache has not written). A cache should check for another cache written dirty, and send a read copy. A write should cause a delete invalidation on the other caches. If locks are correctly written this will preserve all writes. The cache bus only then has to send dirty copies, and in validations. Packet formats are then just an address, an RW bit, and the data width of a cache line (the last part just on the return bus), and the RW bit on the return bus is not used.

What happens when a second write happens, and a read only copy is in transit? It is invalid on arrival, but not responsible for any write back. This is L2 cache here. The L1 cache can also be data invalidated, but can stand the read delay. Given the write invalidate strategy, the packet in transit can be turned into an invalidate packet. The minor point is the synchronous assignment to the cache of the read copy at the same bus cycle edge as the write. This just needs a little logic to prevent this by “special address” forwarding. A sort of cancel on execute as it were.

It could be argued that sending over a read only copy is a bad idea and wastes by over connecting the caches. But to not send it would result in a L3 fetch of something not yet written to L3 yet, or the other option would be to stall based on address until it exits the cache on the other processor from under use of the associative address. That could take a very long time. The final issue is closing the mutex. The procedure is the same as opening, but using a different value to set cached. The mutex needs to be flushed? Nope, as the check cached will send a read only copy, and the set cached will invalidate the other dirty.

I think that makes for a minimal logic L2 cache. The L3 cache can be shared, and the T and S caches do not need coherence. Any sensible code would not need this. The D cache need invalidation only. The I cache should not need anything. When data is written to memory for later use as instructions, there is perhaps an issue, also with self modifying code, which frankly should be ignored as an issue. The L2 cache should get written with code, and a fetch should get a transferred read only copy. There would be no expectation of another write to the same memory location after scheduling execution.

There should be some cache coherence for DMA. There should be no expectation of write to a DMA block before the DMA output transfer is complete. The DMA therefore needs its own L2 cache “simulation” to receive read only updates, and to invalidate when DMA does an input source read. It is only slow off chip IO which necessitates a flush to L3 and main memory. Such things if handled well can allow the write back queue to only have elements entered onto it when hitting the L2 eviction cache. Considering that there is a block of memory which signals cache empty, it makes sense to just pass this write directly out, and latch for immediate continuation of execution, and stall only if the external bus cycle is not complete on a second write to those addresses. The input read on those addresses have to stall by default if a simplistic ideology is taken.

A more complex method is to indicate a pre-fetch. In a similar way to the 1 item buffer. I hope your IO does not read trigger events (unlikely, but write triggering is not unheard of). A delay 1 item buffer does help with a bit of for knowledge, and the end of bus cycle latching into this delay slot can be used to continue processing and routine setup. Address latches internally help with the clock domain crossing. The only disadvantage to this is that the processor decides the memory mapped device layout. It would be of benefit to shuffle this slow bus over a serial protocol. This makes an external PLA, micro-controller or FPGA suitable for running the slow bus, and keeps PIN count on the main CPU lower, allowing main memory to be placed and routed closer to the core CPU “silicon chip” die.

L4 Cache

The concept of cloud as galactic cache is perhaps a thing that some are new to. There is the sequential stride static column idea, which is good for some processing and in effect gives the I cache the highest performance, and shows the sequential stride the best. For tasks that flood the D cache, which is most when heavy optimization is used, The question becomes “is there a need for associative L4 cache off chip?” for an effective use of some MB of static single cycle RAM? With a bus size of 32 bits data, and a 64 bit addressing system, the tag would exceed the data in consuming the SRAM. If the memory bus is just 32 bit addressing, with some DMA SD card trickery for the high word, this still makes the tag large, but less than 50% of the SRAM usage. Burst mode in this sense is auto increment on the low addresses within the SRAM chip to stop copper trace charge power wastage, and a tag check wait state and DRAM access generator. The fact that DRAM is accessed in blocks makes the tag shrink further.

Yes it’s true, DRAM should have “some” associative SRAM as well as static column banks. But the net effect on performance is minimal. It’s more of a L3 eviction cache extender. Things down to the RAM disk has “read only” or even “no action” system files on it for all the memory, makes file buffers actually be files. Most people would not appreciate this level of detail, but it does however allow for easy contraction and expansion of the statically sized RAM disk. D cache thrashing is the problem to solve. Make it bigger. The SDRAM issue can be solved by putting a fair amount of it there, and placing a cloud in higher (or different address space) memory. The SD card interface is then the location of the network interface. Each part of memory is then divided into 3 parts at this level. A direct part, and associative part assisting another cache level and a tag part. the tag part and the associative assist for an interleave partition. Browser caches are a system level feature, not an issue for application developers. The flush cache is “new disk, new net” and beyond.

How to present a file browser picture of the web? FTP sort of did it for files. And a bookmark and search view seem like a good way of starting out. There maybe should be an unknown folder with some entropicly selected default folders based on wordage, and the web becomes seen in scope. At the level of a site index, the “tool type” should render a page view of the “folder”. The source view may also be relevant for some. People have seen this before, or close to it, and made some interesting research tools. I suppose it does not add up to profit per say, but has much more context in robot living allowance world.

This is the end of part II, and maybe more …

 

  • Addressing (An, Dn<<{size16*SHS}.{DS<<size}, d8/24) so that the 2 bit DS field indexes one of 4 .W or larger.fields, truncated to .B, .W, .L or .Q with apparently even more options to spare. If SHS == 3 for example the DS > 1 have no extra effect. I’ll think on this (18th Feb 2017).
  • If SHS == 2 then DS == 3 has no effect. If SHS == 1 then DS == 3 does have an effect.
  • This can provide for 3 extra addressing modes not yet developed.

68k2-PC#d12 It’s got better! A fab addressing mode. More then 12 bits of embed-able opcode space remains in a 32 bit wide opcode extension, and almost all the 16 bit opcodes are used (all the co-processor F line slots). With not 3, but 10 extra addressing modes.

Where the 68k ISA went Wrong

An AMIGA valentine’s special.

With much hindsight it is possible to analyse where the micro computer scene and the processor market overlapped to allow Intel to overtake the market share. The selection of the EC range in the Amiga was put forward as cheaper, which although true, was indicative of the lack of micro computer use of floating point at that moment in time. The RISC ARM had some fast integer multiply and bit shuffle performance about the same era, which convince the use of some of the $Fxxx for all kind of MMU and FPU mish-mash, all not for micro users. Here are some classics of the day.

No one really needs floating point – The kids didn’t, it worked in software, and GPU cards do that kind of thing today if you really NEED it.
Memory management? Who needs it? – The kids didn’t. Why would the OS not sandbox? OK, I get it, Microsoft wrote Windows and needed the general protection fault. But was not that Intel? Surely it would be better to just have read and write protection trapping on certain addressing mode sequences only. Perhaps a bit array of jump entry points, and a forced check of address modes upto the “possible invalidation point”, with restrictions on “doing things” being a sand boxed software process. TLB based virtual memory is powerful, but “clouding up” does show that library calls come in on some level, and overlays just needed easier configuration management, and perhaps a pre-fetch gain.
D cache and I cache … lovely
Why do people use switch statements on highly index-able emulation code? – Why? If going the JIT way, then a flat JIT memory recompile, and a index of jump address mappings (differing encode lengths) would look good. But dynamic jumps would need a jump index table anyhow in any rational coding strategy. The best place to put a user mode trap? Any fixed jump can be resolved, and any computed jump can be sparse treed and block loaded on demand.
Given the hindsight, a more modern fast math multi wide vector unit would be a better addition than an FPU, with a fast trap jump flag in both the user and supervisor state, to quickly run some vector emulation code. Yes, a vector enable flag in user, and a legacy flag in supervisor mode. I don’t like the 3 bit co-processor address stuff either. Those bits would be better used for other things. There’s a whole $Axxx block to mess with for MMU style things.

For example if it was float register destination, then the standard instruction format would work, and 8 operation codes would be possible, with minimal impact on effective address decoding. The fact that saving floating point would require the reversal of the understanding of the “destination” and “source” would be minor, and make a spare addressing mode using the “direct D register implied as destination” encoding for float registers inside the coding. Given that the loading of FPU registers would also use the direct data register mode, Much better would be 8 single operand functions directly operating on float registers with an immediate direct target. Int to float would be via memory, and software.

So a simple 8 dyad, 8 monad FPU would seem possible, with all literal addressing modes fetching float literals. This would have been an effective use of the $Fxxx instruction space. Very easy to learn and apply. Better assembler support, and hence use incremental. Ideal for an FPGA solution, for ease of implementation given the “back catalog” of software not present on the majority of systems such as Amiga.

  • FADD, FSUB, FMUL, FDIV, FLOAD, FSAVE, FMAQ (a literal [post instruction field] multiply of the EA double and add to destination), FPLY (a literal [post instruction field] added and then multiply the EA double to destination).
  • FINRT (inverse root), FNEG, FLNP1, FEXM1, FATN, FSIN, FBRABS (branch on negative or NaN and make absolute, offset post instruction). These would be on the immediate mode.
  • The immediate mode could be to recall upto 64 constants on the An mode. There is no size field. Or have the 8 float registers on the An mode be floats, and have int to float and float to int on the Dn selections.

More Ideas for Optimization

Also the BCD instructions should be removed. I think there are more logical first/better choice instructions to replace them given all the RISC knowledge, and the fact that they were avoided quite often. Useful in COBOL, but what would it offer in the line of an actual 64 bit multiply or such? There seems a lot of squashed in opcodes which seem to clog the orthogonal nature. In fact losing EXG as well, could be useful in extending arithmetic operations in general on the multiplication side. It also has the advantage of removing a read-modify-write as an instruction primary. Removing BCD opcodes also reduces the fan in of the ALU.

What also surprises me is the fact that 68k manufacturers never use the $Axxx opcode space. Reserved for who? Given the OS incompatibility problem needing a recompile from source on a different system, obviously the system level is the place this space is reserved for. $Axxx speculative use is for another day. Maybe DCTs?

As to the nature of the general core, it would be efficient maybe have multiple dispatch, but a smaller core would be faster. A simple hybrid would be pipelined dispatch with stage skipping, A bigger single cycle cache structure would be more silicon efficient. To make optimal use of the data cache is more about compilation data layout. The instruction cache can be optimized by heavy code auto factorization via BWT pattern matching of a compiled binary. This does however throw some stress on the data cache as threading subroutine addresses get stacked.

The important point is though that there is no alteration of these addresses, and as such they would be better placed in a threading cache. They should be transient, and so have better write back only on spill characteristics. If the D-cache does not “dirty” on a subroutine return address stack, and all “pops” to the PC come from the T or threading cache, there is more efficiency beyond Harvard architecture. The question of where to put other stacked data items is maybe relevant. And killing the modification of a stacked return address is perhaps a good idea in general code. Adding a structured POP return is not needed. The T cache can have a higher density with just one spill address needing adding to system state.

This although adding something, does leave potentially a lot of unused “holes” in the D cache. If the pushes and pulls using SP are further placed in an S cache, a similar spill address, and code would still work without the holes, and dirty bits as long as functions did not stack relative outside the current function’s stack frame. In fact if stack frame indexing is used, the stack relative will always work correctly as a chained nest of indexing. This could be optimized by also using the S cache for link register relative indexing too. Some compatibility issues may arise with ill designed code. Perhaps use of LNK and PEA and some other modals should use an S cache counter. Or maybe this would break too many “bad old code” things, such that the supervisor “legacy” bit (or user “vector” bit) should use the D cache on the link register indirect modes when set, and the S cache switched off.

The next thing to add in the IDTS cache architecture is some mechanism for indirect threading to remove all the JSR instruction opcodes from the subroutine threaded list. Basically applying 1980’s memory techniques to the small cache sizes for fast cache speed. Entering threaded mode is technically the easy challenge, and exiting it at the point of the leaf subroutine code and re-entry on RTS is the most complex. Assuming NOP is one of the most useless instructions in a system is a bad idea, as it synchronizes bus transactions.

But as it does “nothing”, it is possibly an interesting case. The fact that addresses will be what is located in a threading list, a special address (or special addresses like the trap vector list), can be assumed to not form part of the threading, and so allow exit into regular instruction mode, by increasing the “thread counter” from zero, with each JSR/RTN pair inc/dec above zero. When the RTN takes the counter to zero, threading is re-entered. There should be an exception vector for counter overflow, and setting it to zero enters threading mode. Reset should set it to one. It would have to be set via program to zero via a supervisor trap.

Cache Multi-threading

Further optimization of the D cache for other structures which can be paralleled is maybe worth a little more time. A reasonable eviction buffer will fix some issues. The microcode can be split into parts for each pipeline stage, to help reduce its total size. Some pipeline sections might not even need a dual level microcode, or those that do can be 2 pipe staged. Some emphasis on branch prediction has been very effective in modern processor design, with the simplistic backwards taken a staple for many years. This is getting into speculative execution territory, but at this level can be considered speculative decode, up until the first register assignment, with some simplistic stall and flush on miss prediction.

To keep the interrupt latency low is another consideration. This is greatly simplified in a stall flush design, as instruction restart is assured if register commit has no speculation into register renaming. This keeps the processor complexity down, and silicon area down. Having a second “hyper thread” is the most logical way to expand on the core. This does increase area, but shares a lot of logic when cache misses happen. It does however place its own pressure on the caches, and effectively reduces their size per thread. So turning cache area into register decode area, and consequential access delays setting the core size before going multi-core.

An interesting possibility is flipping the hyper thread registers all at once, and having no extra decode. Then splitting the cache into 2, such that the performance of each half is via less decreasing returns, slightly more effective than keeping them unified and being pressured by each other. This requires a 1 cycle switch wait, but also toggles between waits on both when both cache halves are stalled. There is slightly more complexity in write back, but this not much of a relative problem. Multi-threading the OS queue is maybe also complex. or perhaps just as simple as interleaving the timer interrupt to be serviced by either core alternate style. Mutex locking the message passing is the only place this need be done, as the process model of sequential message queues sorts out most contention issues in an already multi-tasking system which has device and file locking.

Things that Might go at $Axxx

So far memory management, and things like DCT codec assistance. The 68k addressing modes support some very complex modes, and even then there is an extra bit set 0 (bit 3) in the full extension word format, which with a bit 8 set to 1, imply even more complex or different modes can be made. Along with the 5 I/IS reserved indications in a regular format full extension. This along with a mode field of 111 and register 101 to 111, give plenty of addressing mode expansion possibilities. The bit field instructions are kind of unnecessary on a large memory system, as it’s better to split bit fields, or use longer instruction sequences. Maybe some are useful, but with bit plane offsets, and modulo in video architecture, there reason for being is largely removed. It’s massive data structures packed then which might need them, for smaller than byte fields. There is also the size operands in some instructions which only have 3 possibilities in 2 bits. Is the extra bit state used? Some instructions already use the can’t target certain modes for extra functionality, such as ORI, suggesting a size long 10, for some extra state information, and a generic 11 size action, mixed up with some strange “opmode and can’t be immediate” of the general OR.

I’d suggest some opcodes are hidden, such as being able to use the other non targetable modes, in “read” for some operand to target an implicit register. It looks so, and would the ruling out of these be consuming unnecessary silicon area to trigger an illegal instruction trap? The PC relative indirect modes for example, although that would just waste literal displacement immediate to point to another literal, and cycle time too. So some more implicit registers could be handled. So I bet the four unused register values in an addressing mode 111 situation would be best as just 4 extra 32 bit registers. Not usable from everywhere, but generally quicker than memory. Not general purpose, and so some compilers would find them difficult to use, but useful in hand optimized inner loops, or for often used task global variables. There is some ideal to go back to plain 68k, with the few minor fixes (if I remember there was some problem with finding status, and supervisor mode that was fixed in later models), and rationalise the add-ons.

So memory needs read protect, write protect, and non execute protect as all the calculated jumps can be indirected and range checked via a library which is accepted. Write protection is perhaps the hardest, as read protection can be done on a per task basis protected secure segment accessed via a supervisor trap and an exception on reading below a memory bound, and using the task ID in a structure allocated below this bound. Generally write protection is not just own task write protection, but global. It also can be loaded up after the addressed data has been loaded into the cache and perhaps overwritten in the cache. The write back is the resolving time of this issue for speed. A simple “bitmapped pixel per page (or even cache line)” strategy would be fast. This uses the minimum of memory. The lack of TLB delays, and having to deal with page faults would make this good. The number of bits per page can be increased to provide protection rings. Or active write zones.

Effective Data

The idea that to extend the addressing scheme the extra I/IS field values in the regular format full extension maybe could do transforms on the data from or to the effective address. This throws an extra ALU into the bus unit. It could be useful, but like all the complex addressing modes it might not get used as much as the designer would hope. PEA and LEA would not use it, Maybe 4 of the codes use the four extra 32 bit registers that can be made? But this would not meld right. They are used in effect to switch on and off sets of things in the addressing mode. 101 to 111 should do a double memory indirect action with null, word and long outer displacements ([[bd,An]],Xn.SIZE*SCALE,od). The two remaining I/IS codes 100 and 100 with differing IS bits should really be part of a group of 3 reassigning 1:000 to ([[bd,An,Xn.SIZE*SCALE]],od) for some flexibility. A BD size field of 00 should also have some effect on the modes, by writing back into An the value An+bd, where the displacement is a word. This write back is even done when the base register is suppressed.

If full extension word format bit 3 is a logic 1, then bits 2 to 0 are an outer displacement register, and bit 6 selects if it is an address or data register. That finalizes the complexity of the addressing modes, apart from maybe utilizing base register suppress for non PC values, for all you crazy people out there. More likely a strange way of getting 8 extra addressing modes. Maybe it would be better to just redesign the extension word format. Such as making bits 7 to 0 have new meaning in an better format full extension. Making bits 5 to 0 be a nested register extension using the result of this effective address indirected as an extra base displacement, and bits 7 and 6 selecting a BD SIZE field, with 00 indicating another following brief extension word, 01 a word, and 10 a long (and maybe 64 bit at 11). The base displacement can be built up by nested brief extension too. Simple. There’s a bit to select a modification of the lower byte.

Just an order for unwinding the brief extension follows counter after processing the full extension register specs. As this ordering does not require a backward search. So a full address mode nesting (indirect displacements), with some simple immediate displacements. The T cache can be used for stacking the nesting, as no JSR will be done in an operand decode.

But no I won’t be building a new instruction table yet. There are other things to code. This is turning into a fill the whole instruction space challenge. For all numerically increasing opcodes upto and including MOVE, I’d have to remove MOVEP, as slow IO can be easily replaced by rarely used longer instruction sequences. I’d also rationalize all the SIZE field 11 CAS group instructions, and maybe move them. MOVES is also suitable for normalization, and having 3 system registers, one to address, one read/write data, and one read status gives the 4 possible bit codes for SIZE in a better MOVES. RTM and CALLM are fine. CMP2/CHK2 are another SIZE field along with CAS/CAS2. In total there are non target modes (6) times SIZE (4) times 2 instructions for prefixes in this range (so 48). If there are 3 auxiliary registers (111 101 to 111 111, as I miscounted 4 earlier), then 4*2 = 8, so 8 prefix words for immediate targets with SIZE.

The 3 auxiliary registers could be something else. Some kind of indirect auto active addressing mode? The 8 prefixes could be assigned to add 8 full width instruction extensions. Packing can be achieved by having them as sticky prefixes via a system status register, but that would need a system trap to exit, or an exit opcode within the added extension which would be preferred. For large data structures 111 101 addressing mode should do a d32 displacement, but using the PC is the only possibility so not that useful. Allowing PC relative writes, as the default. The CALLM should support a d16 for the possible parameter size. Allow BSET etc. on address registers instead of MOVEP, Also 2*32 bit control registers can be targeted similar to CCR.

There would be some rational to define a SIZE 11 meaning, and just eliminate everything which overrode it. OK, so it’s 64 bit. No CMP2, CHK2, MOVEP, CAS, CAS2, and MOVES is modified, d8 become upto d16, and all immediate mode do SR, CCR and a 32 bit and a 64 bit register. This stops the crazy overloading of bit fields. Excellent. This does affect things like MOVE from CCR in other opcodes, and some fiddling about would have to be done, as the immediate target would easily do the MOVE to.

Diagnosis

The SIZE field setting of 11 was not kept free for 64 bit use. Resulting in termination of the ISA expandability. As for addressing modes the terminal issue was not flexibility, the problem was not reassigning the XXX.W mode (difficult to use in most code on a multitasking system), and reassigning it as (d32, Reg). This makes 111 101 be (d32, PC). MOVEQ with bit 8 set, becomes a d24 loader. There are quite a few more. There is a nice solution to the bidirectional CCR and SR issue by using 111 110 and 111 111. This makes for various immediate mode prefixes, for the targeting instructions. In fact the general illegal trap should handle the un-handled ones. Or an emulation 64 bit overlap non supervisor exception would be triggered if the supervisor legacy bit was set. Quite a lot of instructions would have to change but for the A register restrictions on some, and immediate mode target.

That horrid idea to have AND and OR have a memory destination mode and the EOR fart-o-matic restrictions. “Well I couldn’t do long division due to having an OR condition I needed to sort out in the D cache.” Much better to retask the direction bit for DIV at all sizes, and MUL too, yes and all of the unsigned kind, or make the byte and word sizes, do long and quad signed, as this would be more useful. The remainder is not available on the quad size, nor the multiply high word, would make a good compromise.

68k2 is a spreadsheet I put together to analyse the ISA. Putting the old 68k instruction set in a vectored non supervisor soft interrupt clearing the compatibility flag, based on a vector with the following function added (b15.b14.b13.b12.b8.b7.b6.b5.b4.b3)<<2 and an extra indirect jump taken. It makes a 4096 byte table, and perhaps some more soft decode, and a little set of subroutines ended RTS (or a modified enter compatibility version). This does kind of set the coding of RTS in the mess, but this can be vectored. Doing so makes a bootstrap potentially easier.

Making the 64-bit idea apply to all registers, and have a massive GB SD automatic interface memory mapped so the low end is a disk hibernation space of the fast RAM, and memory limit interrupt then a 4GB system becomes natural. Even loading the DMA registers becomes automatable, with the direction of transfer too, even though the transfer size may cause confusion in code. It does mean a RAM disk driver would work, with a slight modification of the sector rw routines, and the initial format process changed some.

Part II

LZW (Perhaps with Dictionary Acceleration) Dictionaries in O(m) Memory

Referring to a previous hybrid BWT/LZW compression method I have devised, the dictionary of the LZW can be stored in chain linked fixed size structure arrays one character (the symbol end) back linking to the first character through a chain. This makes efficient symbol indexing based on number, and with the slight addition of two extra pointers, a set of B-trees can be built separated by symbol length to also be loaded in inside parallel arrays for fast incremental finding of the existence of a symbol. A 16 bucket move to front hash table could also be used instead of a B-tree, depending on the trade off between memory of a 2 pointer B-tree, or a 1 pointer MTF collision hash chain.

On the nature of the BWT size, and the efficiency. Using the same LZW dictionary across multiple BWT blocks with the same suffix start character is effective with a minor edge effect, rapidly reducing in percentage as the block size increases. An interleave reordering such that the suffix start character is the primary group by of linearity, assists in the scan for serachability. The fact that a search can be rephrased as a join on various character pairings, the minimal character pair can be scanned up first, and “joined” to the end of the searched for string, and then joined to the beginning in a reverse search, to then pull all the matches sequentially.

Finding the suffixes in the LZW structure is relatively easy to produce symbol codes, to find the associated set of prefixes and infixes is a little more complex. A mostly constant search string can be effectively compiled and searched. A suitable secondary index extension mapping symbol sequences to “atomic” character sequences can be constructed to assist in the transform of characters to symbol dictionary index code tuples. This is a second level table in effect, which can be also compressed for atom specific search optimization without the LZW dictionary loading without find.

The fact the BWT infers an all matches sequential nature, and a second level of BWT with the dictionary index codes as the alphabet could defiantly reduce the needed scan time for finding each LZW symbol index sequence. Perhaps a unified B-tree as well as the length specific B-tree within the LZW dictionary would be useful for greater and less than constraints.

As the index can become a self index, there maybe a need to represent a row number along side the entry. Multi column indexes, or primary index keys would then best be likely represented as pointer tuples, with some minor speed size data duplication in context.

An extends chain pointer and a first of extends is not required, as the next length B-tree will part index all extenders. A root pointer to the extenders and a secondary B-tree on each entry would speed finding all suffix or contained in possibilities. Of course it would be best to place these 3 extra pointers in a parallel structure so not to be data interleaved array of struct, but struct of array, when dynamic compilation of atomics is required.

The find performance will be slower than an uncompressed B-tree, but the compression is useful to save storage space. The fact that the memory is used more effectively when compression is used, can sometimes lead to improved find performance for short matches, with a high volume of matches. An inverted index can use the position index of the LZW symbol containing the preceding to reduce the size of the pointers, and the BWT locality effect can reduce the number of pointers. This is more standard, and combined with the above techniques for sub phases or super phrases should give excellent find performance. For full record recovery, the found LZW symbols only provide decoding in context, and the full BWT block has to be decoded. A special reserved LZW symbol could precede a back pointer to the beginning of the BWT block, and work as a header of the post placed char count table and BWT order count.

So finding a particular LZW symbol in a block, can be iterated over, but the difficulty in speed is when the and condition comes in on the same inverse index. The squared time performance can be reduced? Reducing the number and size of the pointers in some ways help, but it does not reduce the essential scan and match nature of the time squared process. Ordering the matching to the “find” with least number on the count makes the iteration smaller on average, as it will be the least found, and hence least joined. The limiting of the join set to LZW symbols seems like it will bloom many invalid matches to be filtered, and in essence simplistically it does. But the lowering of the domain size allows application of some more techniques.

The first fact is the LZW symbols are in a BWT block subgroup based on the following characters. Not that helpful but does allow a fast filter, and less pointers before a full inverse BWT has to be done. The second fact is that the letter pair frequency effectively replaces the count as the join order priority of the and. It is further based on the BWT block subgroup size and the LZW symbol character counts for calculation of a pre match density of a symbol, this can be effectively estimated via statistics, and does not need a fetch of the actual subgroup size. In collecting multiple “find” items correlations can also be made on the information content of each, and a correlated but rarer “find” may be possible to substitute, or add in. Any common or un correlated “find” items should be ignored. Order by does tend to ruin some optimizations.

A “find” item combination cache should be maintained based on frequency of use and execution time to rebuild result both used in the eviction strategy. This in a real sense is a truncated “and” index. Replacing order by by some other method of such as order float, such that guaranteed order is not preserved, but some semblance of polarity is run. This may also be very useful to reduce sort time, and prevent excessive activity and hence time spent when limit clauses are used. The float itself should perhaps be record linked, with an MTF kind of thing in the inverse index.

Implementation of Digital Audio filters

An interesting experience. The choice of FIR or IIR is the most primary. As the filtering is modelling classic filters, the shorter coefficient varieties of IIR are the best choice for me. The fact of an infinite impulse response is not of concern with a continuous stream of data, and coefficient rounding is not really an issue when using doubles. IIR also has the advantage of an easy Sallen-Key implementation, due to the subtraction and re-adding of the feedback component, with a very simple CR processing.

The most interesting choices are to do with the anti-alias filtering, as the interpolation filter, on up-sampling is an easy choice. As the ear is not really responsive to phase, all the effort should be on the pass band response levels, and a good stop band non response. A Legendre or Butterworth are the candidates. The concept of a characteristic sound enters the design process at this point, as the cascading of SK filter sections is conceptually useful to improve the -6 dB response at cut off. This is a trade off of 20 kHz to 22.05 kHz in the alias pass band, and greater attenuation in the above 22.05 kHz infinite stop desire. The slight greater desire of alias attenuation above pass band maximal flatness (for audio harmony) implies the Legendre filter is better for the purpose than Butterworth.

In the end, the final choice is one of convenience. and a 9th order filter was decided upon, with 4 times oversampling. The use of 4 times oversampling instead of 8 times oversampling increases the alias by an octave reduction. This fact under the assumption of at least a linear reduction in the amplitude of the frequency of the generator of an alias frequency, with frequency increase, just requires a -12 dB extra gain reduction in the alias filter for an effective equivalence to 8 times oversampling (the up to and the reflection back down to 6 + 6). The amount of GHz processing also halves. These facts then become constructive in the design, with the bulk alias close to the cut off, and the minor reflected alias-alias limit, not being too relevant to overall alias inharmonic distortion.

A triple chain of 3 pole Legendre filter sections is the decided design. The approximate -9 dB at the corner, allows for slightly shifting up the cut off and still maintaining a very effective stop band. Code reuse also aids in the I-cache usage for CPU effective use.  A single 3 pole Legendre is the interpolation up sample filter. The roll off for not using Butterworth does cut some high frequency content from the maximally flat, hence the concept of maximally flat, but it out performs a Bessel filter in this regard. It’s not as though a phasor or flanger needs to operate almost perfectly in the alias band.

Perhaps there is improvement to be made in the up sampling filter, by post up sample 88.2 kHz noise shaped injection to eliminate all error at 44.1 kHz. This may have a potential advantage to map the alias noise into the low frequencies, instead of encroaching from the higher frequencies to the lower, and for creating the alias as a reduction in signal to noise, instead of at certain inharmonic peaks. The main issue with this is the 44.1 kHz wave fundamental, seen as the amplitude ring modulation of the injected phase noise, by the 44.1 kHz stepped waveform between samples input. The 88.2 kHz “carrier” and the sidebands are higher in frequency, and of the same amplitude magnitude.

But as this is following for no 44.1 kHz error, the 88.2 kHz and sidebands are the induced noise, the magnitude of which is of the order of 1 octave up from the -3 dB roll at the corner, plus approximately the octave for a 3 pole filter, or about 36 dB cut of a signal 3/4 of the input amplitude. I’d estimate about -37 dB at 88.1 kHz, and -19 dB at 44.1 kHz. Post processing with a 9 pole filter, provides an extra -54 dB on down sampling, for an estimate of around -73 dB or greater on the noise. That would be about 12 bit resolution at 44.1 kHz increasing with frequency. All estimates, likely errors, but in general not a good idea from first principals. Given that the 44.1 kHz content would be very small though post the interpolation filter, -73 dB down from this would be good, although I don’t think achievable in a sensible manor.

Using the last filtered sample in as the reference for the present sample filtered in as a base line, the signal at 22.05 kHz would be smoothed. It would have a notch filter effect, by injecting quantization offset ringing noise at 88.2 kHz to cancel 22.05 kHz. The notch would likely extend down in frequency for maybe -6 dB at about 11 kHz. Perhaps in the end it is just better to subtract the multiplied difference between two up sample filters using different sinc spreading of a 1000 and a 1100 sample occupancy zero inter fill. Subtracting the alternates up conversion delta as it were.

There is potentially also an argument for having a second order section with damping factor near 0.68 and corner 22.05 kHz to achieve some normalisation from sinc up-sampling. This adds in an amount of Q such as to peak the filter cancelling the sinc droop, which would be about 3% at 4 times oversampling.

EDIT: Some of you may have noticed that the required frequencies for stable filtering are too high at 4 times oversampling. So unfortunate for the CPU load an 8 times oversample has to be used. The sinc error is less than 1% at this oversample, but still corrected in a similar way, and a benefit of 2 extra poles. Following this by a 0.1 dB 3 pole Chebyshev high pass which has been inverted, gives a reasonable 5 pole up sampling filter. The down sampling filter for code efficiency is a triple instance of the sample inverse Chebyshev, with the corner frequencies slightly offset to produce more individual zeros, and some spreading of the “ringing”. These 9 poles are enough to get the stop band ripple to be lower than a 16 bit resolution. Odd order inverse Chebyshev are essential for the reflected spectra to be continually decreasing in amplitude.

Musical Research

I’ve looked into free musical software of late, after helping out setting up a PC for musical use. There’s the usual Audacity, and feature limited, but good LMMS. Then I found SuperCollider, and I am now hooked on building some new synthesis tools. I started with a basic FM synth idea, and moved on to some LFO and sequencing features. It’s very nice. There is some minor annoyance with the SC language, but it works very well, and has access to the nice Qt toolkit for making GUIs. So far I’ve implemented the controls as a MIDI continuous controller bus proxy, so that MIDI in is easy. There will be no MIDI out, as it’s not that kind of be all thing. I’ve settled on a 32 controls per MIDI channel, and one window focus per MIDI channel.

I’ve enjoyed programming it so far. It’s the most musical I’ve been in a long while. I will continue this as a further development focus.

The situation so far https://github.com/jackokring/supercollider-demos

The last three, are just an idea I had to keep the keyboard as a controller, and make all the GUI connection via continuous controllers only. I’ll keep this up to date as it develops.

Tax in a Technological Economy

I decided to produce this free work covering a subject certain to almost all, as the saying goes death and taxes. As director of this company I have to hold an opinion on such things, be sure the government does. I thought it best to open the internal decision stream to customers and subcontractors and the wider public, so that such issues such as am I buying services from the latest tax haven, or is there and extra 20% effective going to the cause of central. So this company will pay all taxes due, and perform no self manufactured tax breaks by diverting funds into holding companies with surprisingly opaque financials. As to the issuance of dividend, it’s only an issue of the differential between personal income tax and corporation tax. Hence if one is higher or lower, the government intent is to suggest that (well depending on it being a break or a punitive), one or another should be done. To comply with this differential all profit be either corporation tax or full issue to dividend the same effective amount, to the effect of fulfilling the intentional supra directive of governance. As a maximal to government policy can have no detriment beyond the sense of the government option to issue rebate, a matter of two sums and a max holds the basis for the return. This will be issued into the record as a short form expression of this intent is formed.

The arguments against this are not relevant for small companies, but are interesting to me. The financial weight of a company in a sense votes on the validity of government policy, and also via the central limit theorem can, but not always does add some stability to the bumps of government policy input at very specific points such as the budget. Weather this share based meta vote is valid in ideal is not the point. It is fact. In a way, I felt the need to inject situation as with a very large company implementation of a maximal extreme policy would lead to potential PLC stock instability, and more critical divergence and resulting accountability of government policy.

In the days where director’s bread runs thin, the prices really could be justified to be higher, but in the situation of capital demand, the lower but liveable introductory offer is the sale, sale, sale of it all. The capital buffer of a large company will always outperform in a sale negative cost battle. The only option in such a situation is quality contract delivery at a reasonable cost. This is innately a consequence of pay by the hour systems. All known good clients know this. Optimization efficiency, and automation are exemplified by the phone in your pocket. Robots, robots, robots. Some estimates place 20% redundancy through automation in lifetime as conservative. The effective replacement of pay by the hour occupational replacement is yet not an automated provider. Robot living allowance is not a joke, Although darkly funny.

As a digital business, with surprisingly analog books, many computer based Americanisms and longer term goals of electrical production, KRT has to be aware of future customers, and not just marketing for now. Holding of risk based on past operational equipment when the economic model is to design and sell services for the future deficit load, and not a present credit bubble, is the game of the future. At present KRT has on occasion used contractors, as employee costs are difficult to justify. Not the work risk, but the contract risk. Continuous jobs, even technology based ones, need a steady supply of work contracts coming in. The larger the contract the more unstable it is to renegotiation, and need of a cash buffer. Contractors are much easier. They are on in effect though zero hour contracts as the lingo goes, although this is more like at least some hours but no repeat business guarantee. This then leads on to the obvious development of customer service, to achieve a conversion and repeat business rate. At KRT, the model is for quality contract service. This does not involve a cold call sales pitch, as I receive enough of those already. They are good for occupying the time of those who do not need the service, with an occasional contract win. Delivering services to people who are in need of the service, without harassment for further business, is part of the service. Good customers will always know where to look again, and are good at recommendation. The most effective strategy then from the KRT point of view is the development of R&D rigs and insight concept projects. Interesting technology becomes worth looking in on.

The updates will continue where relevant.

Collections and Operator Overloading NOT in JS

Well, that was almost a disappointment for optimising ordered collection renders by using arrays. But I have an idea, and will keep you informed. You can check the colls.js as it evolves. I wont spoil the excitement. The class in the end should do almost everything an array can do, and more. Some features will be slightly altered as it’s a collection, and not just an an object with an array prototype daddy. The square brackets used to index arrays are out. I’m sure I’ve got a good work around. I’m sure I remember from an old Sun Microsystems book on JavaScript text indexed properties can do indexing in the array, but that was way back when “some random stuff is not an object” was a JS error message for just about everything.

One of these days I might make a mangler to output JS from a nicer, less coerced but more operator coerce-able language syntax. Although I have to say the way ECMAScript 6 is going, it’s a bit nuts. With pointless static and many other “features”. How about preventing JavaScript’s habit of just slinging un-var-ed variables into the global namespace without a corresponding var declaration in that scope? The arguments for and against are easy to throw a value to a test observer to debug, versus harder to find spelling errors in variable names at parse time. If the code was in flight while failing, then knowing the code will indicate where the fault lies. For other people’s code a line number or search string would be better. Either works for me.

My favourite mind mess would be { .[“something”]; anything; } for dynamic tag based on the value of something in object expressions. Giggle.

Spoiler Alert

Yes, I’ve decided to make the base array a set of ordered keys, based on an ordering set of key names, so that many operations can be optimised by binary divide and conquer. For transparent access a Proxy object support in the browser will be required. I may provide put and get methods for older situations, and also because that would allow for a multi keyed index. The primary key based on all supplied initial keys and their compare functions to use and the priority order [a, b, c, …], and automatic secondary keys of [b, c, …], [c, …], […], … for when there needs to be so, I’ll start the rewrite soon. Of course the higher operations like split and splice won’t be available on the auto secondary keys, but in years of database design I’ve never could have not been one of such form.

The filt.js script will then extend utility by allowing any of the auto secondaries to be treated as a primary on the filter view, and specification of an equals, or a min and max range. All will share the common hidden array of objects in a particular collection for space efficiency reasons. This should make a medium fast local database structure possible, with reasonable scaling. Today and tomorrow though will be spent on a meeting, and effective partitioning strategy to avoid a “full table pull requests” to the server.

Further Improvements

The JSON encoded collation order was chosen to prevent bad comparisons between objects with silly string representations. It might be extended, such that a generic text search, and object key ordering are given some possibility. This is perhaps another use the compression can be put as BWT in the __ module has good search characteristics. Something to think over. It looks as though the code would be slow depending on heavy use of splice. This does suggest an optimization by making another Array subclass named SpliceArray which uses an n-tree with sub element and leaf count and cumulative tally, for O(1) splice performance.

WordPress. What’s it like?

I’ve been using WordPress for a while now, and it’s OK. The admin interface can take a little to get used to, especially when plugins throw menus all over the place, but the online help is very good. The main issues recently were with configuring an email sendmail system so that WordPress could send emails. Upgrading can also be a bit of a pain too, The main issue at the moment is building a dynamic web app. The option to edit WordPress PHP is a no, on the grounds of updates of WordPress source overwrite changes. Creation of a WordPress plugin is also an option, but was not chosen as I want client rendering, not server rendering in the app, to keep the server loading low. I see WordPress as a convenience, not a perfection. So the decision was taken to use client side JavaScript rendering, and have one single PHP script (in the WordPress root folder) which supplies JSON from extra tables created in the WordPress database. An eventual plugin for WordPress may be possible to install this script, and a few JavaScript files to bootstrap the client engine, but not at this time.

This way of working also means the code can be WordPress transparent, and be used for other site types, and an easier one script conversion for node.js for example. With an install base of over 70 million, and an easy templating CMS, WordPress is a good, but pragmatic choice for this site so far. The other main decisions then all related to the client JavaScript stack. I decided to go for riot as the templating engine as it is lightweight and keeps things modular. Some say ReactJS is good, and it does look it, but I found riot.js which looks just as good, is smaller as an include (Have you seen the page source of a WordPress page?) and has client side rendering easily. And then looking at either underscore.js or lodash.js (I picked underscore), for a basic utility. The next up is the AJAX layer. While WordPress does include jQuery, for independence from WordPress tie ins, this ruled out the OK backbone.js and also a fully custom layer allows me to experiment with bandwidth reduction using data compression as a research opportunity. So I have laid out a collections architecture for myself.

Connecting riot.js to this custom layer should be effectively easy. The only other issue was then a matter of style sheet processing to enhance consistency of style. The excellent less.js was chosen for this. Even though client side rendering was also chosen, which is sub optimal (cached, but uses time, but also allows the possibility and later opportunity of meta manipulation of say colour values as sets, for CSS design compression), but does have freedom from tie in to a particular back end solution (a single PHP script at present). So that becomes the stack in its entire form minus the application. Well I can’t wright everything at once, and the end user experience means the application form must be finalized last, as possibility only remains so this side of implementation. For the record I also consider the collections layer a research opportunity. I’ve seen a good few technologies fail to be anything but good demos, and some which should have remained demos but had good funding. Ye old time to market, and sell a better one later for double the sales. Why not buy one quality one later?

RiotEmbed.js Coding Going Well

I’ve done more on riotEmbed today. Developed a system for hash code checking any scripts dynamically loaded. This should stop injection of just any code. There is also some removal of semantic and syntax errors based on ‘this’ and ‘call’, and some confusion between JSON, and JS which can contain roughly JSON, where ‘x: function()’ and ‘function x()’ are not quite the same.

I’ll do some planing of DB schema tomorrow …

The following link is a QUnit testing file I set up, which does no real tests yet, but is good for browser code testing, and much easier than the convoluted Travis CI virtual machine excess.

QUnit Testing

I’ve added in a dictionary acceleration method to the LZW, and called it PON (Packed Object Notation), which is only really effective when used after a BWT as in the pack method. Also some Unicode compression was added, which users of local 64 character sets will like. This leaves a final point in the compression layer of UCS-2 to UTF-8 conversion at the XmlHttpRequest boundary. By default this uses a text interface, and so the 16 bit characters native to JavaScript strings, are UTF-8 encoded and decoded at the eventual net octet streaming. As the PON is expected to be large (when compression is really needed) compared to any other uncompressed JSON, there is an argument to serialize for high dictionary codes (doubling of uncompressed JSON size, and almost a third of PON size), or post UTF to apply SUTF coding (cutting one third off compressed PON without affecting uncompressed JSON). The disadvantage to this is on the server side. The PON is the same, but the uncompressed JSON part will need an encoding and decoding function pair, and hence consume computational and memory buffer resources on the server.

As the aim of this compression layer is to remove load off the server, this is something to think about. The PON itself will not need that encoding taking off or putting on. As much of the uncompressed JSON part will be for SQL where clauses, and as indexes for arrays, the server can be considered ignorant of SUTF, as long as all literals used in the PHP script are ASCII. This is likely for all JSON keys, and literal values. So the second option on the client side of SUTF of the UTF would be effective. Some would say put Gzip on the server, but that would be more server load, which is to be avoided for scaling. I wonder if SUTF was written to accept use of the UInt8Array type?

Some hindsight analysis shows the one third gain is not likely to be realized except with highly redundant data. More realistic data has a wider than 64 dictionary code spread, and the middle byte of a UTF-8 is the easiest one to drop on repeats. The first byte contains the length indication, and as such the code would become much more complex to drop the high page bits, by juggling the lower four bits (0 to 3) in byte two for the lower four bits of byte one. Possibly a self inverse function … Implemented (no testing yet). The exact nature of JSON input to the pack function is next on the list client side, with the corresponding server side requirements of maintaining a searchable store, and distribution replication consistency.

The code spread is now 1024 symbols (maintaining easy decode and ASCII preservation), as anything larger would affect bits four and five in byte two and change the one third saving on three byte UTF-8 code points. There are 2048 dictionary codes before this compression is even used, and so only applies for larger inputs. As the dictionary codes are slightly super linear, I did have an idea to normalize them by subtracting a linear growth overtime, and then “invert the negative bulges” where lower and hence shorter dictionary codes were abnormal to the code growth trend. This is not applicable though as not enough information is easily available in a compressed stream to recover the coding. Well at least it gives something for gzip to have a crack at for those who want to burden the server.