Gradient Optimization

So if a gradient descent hyper-parameter controlling the learning rate is the usual way, how can this possibly be improved? Considering that in some way the approximation of future gradient alterations is distributed depending on the batch, the stability via an average gives a more stable basis to then infer an accelerated projection of the future descent.

The biggest problem to consider is bound oscillation. When the accelerated projection is magnifying the learning delta to apply such that locality is an asymptotic non-convergent (reverse symmetry in summation acceleration by considering the divergent terms as “merging toward” the first term limit). This then would converge as a metaseries in some instances, but not all. It then becomes essential to scale the approximations by inverse power weighting to make a convergent for highly entropic unstable weights. It may also indicate that weight decomposition may be an effective strategy to obtain a neuron split into the stable (time aligned) and the unstable (time inverted) partitions of a signal.

Assuming the unstable partition has a repellor (opposite to an attractor in chaos), modelling could be used to invert the accelerated projection to the repellor. If the accelerated series is approximated by an integral, the unstable inverse acceleration would perhaps be a reversal of the limits of integration? Or a sign reversal of the limits?

In a sense the splitting of the network into a composition of multiple networks based on partitions related to the number of critical negative signs (or more precisely the number of things that could have negative signs). In this case just 1 sign for a time is like hyper-parameter convergence property. The algorithm after decomposition can then be specifically optimized per partition.  

Future Prediction by AI

So given that the future estimation could be trained on data from a delayed assumption state from the past prediction of the present, then what is missing? The missing seems to be based on the time factorization process NP problem and innovation stimulus which would cover things that are unknown within the net as well as time relevance which was not compensated for (the delay has an opportunity to sample lesser pasts for greater present prediction but produces nearer futures without doing Monte Carlo assumptions for a spread).

A subnet could be trained to do the estimations of the best assumption for such a predictive engine, leading to a trainability for an expected spread entropy (a situational requirement of MUST and or ANY as GOOD) given a similarity measure of an output of training to a random network spread RND classifier. is an interview with an author on an interesting paper about AI exploration. This covers the RND idea in a use case. Training a post RND latent space map to merge lingual or other equivalent factorizations of the novelty could be part of this.

The reevaluation of situational state novelty then can become a post addition of a trained residual based on the expected future estimation and the purpose to which the predicted estimator is to be put. Imagine on a stage pretending or on a real battlefield. The eventual motor actions of production to have for benefit?

Minecraft Modding

And so I downloaded Forge for 1.18.1 BETA. I’ve got the basic details down, and hope to convert some tutorial module to 1.18.1 and then set about making some extras to try out. ExactFeather396 is the github repo, and it will develop slowly. I seem to have a Microsoft username.

It seems quite a lot is JSON these days, and sometimes the names of methods change a bit. Hopefully this is not too weird to convert and make into something. Some AI mob redstone thing? Who knows?

EDIT 2022-03-01: Ah, so that’s how potions are made! And the RegistryMap class might come in useful later. I’ll have to make some of the classes final and private or default some of the public classes.

Just analyzing the base code at the moment to make it adaptive and have minimal technical debt. I might abstract off some of the names.

Seems the rendering of Mobs is somewhat complex. So I’ll have to have a look see. A basic Zomie clone reskinned seems easiest. I’ve started on an AI exception mechanism. It starts with BaseCodeException which fires the instict emote() when it reaches the base code, which proxies the actionTry() and actionCatch(BaseCodeException) for consequential instinct.

Ah, exceptions and encapsulation within a RuntimeException to avoid the dreaded can’t override method with throws extension, and having type checking and catching.

EDIT 2022-03-09: So it has been simplified a bit and made more complex. I’m making it so that various Loaded classes via extension are invoked via a static and then an instance of self passed to it allows the invoke dynamic override while avoiding lots of casts to super classes.  This may look more complex but it does allow easier pull request merges by keeping things in seperate files to the mini module level.

Added in a simple potion system I’ve yet to test.

Recursive Predictive Neural Networks

Given that the output of a neural net can be represented as y derived from an input x and a feedback operator f(y) the network can be trained on which may include differential and integral operators in the operator f. As f(y) can be considered to be the feedback synchronization point which is clocked to transit the network forward in prediction, f(y) is delayed in y such as to be f(y(t-1, …, t-n)) is the applied feedback to stop “epileptic oscillation” of the forward net function.

The network itself can be programmed on the sequence to learn in an open loop gradient decent and the bias of x activation to f(y) remembrance by either weighting or digital percent application gating. The pattern to lock onto for an input can be trained independent of an input, and then offset by application of the triggering input to balance activation of one output versus another. The actual spreading and maximization of the output attractors becoming disjunct from instancing which attractor to present as output from input.

The “old” feedback from the “last” remembered thing introduces some chaos and mal-attractor effect. This can be removed a little by using an expected previous context training pre-sequence. This can then also introduce contextual recall. The “short term memory” being the contextual state of y, so programming the long term sequence prediction memory with context y and stimulus x.

The production of optimal context for stimulus itself become a network programming challenge. It represents the concept of changing predictive utility. As the forward transfer of the network produces the output to feedback, the network itself could produce the optimal context from the requirements delivered through part of x deciding the contextual decode mode. A separate net to organize the change of context in bulk would have specialization separation and generation of terms in parallel advantages. In utility though it would only be used to switch contexts, or cross imagine contexts to place the prediction net on a creative sequence.

This could have application when the context is considered a genetic algorithm process for tuning the network to produce some kind of granular attractor synthesis. The process of providing the scoring feedback in synthesis mode controlled by a hardwired concept of misadventure excursion in the prediction. Another network for bad state recognition to complement the entropy generative context granularization network? So the reality predictive network is contextualized, granulated and tested for productive futures. Then a final factorization of synthetic addition requirements of the imagined product can be performed by a final independent network.

Consciousness is within this last network as the self image of adding self as a possibility factor. The production of a threshold of motor action to produce an attempt at achieving the estimated reality granularization (subject to bounds constraints) being the primary motivator.

A Speech Action Co-ordination Domain

If the input x, and the output y with feedback descriptions, current “genetic” gene combinators and more can be serialized as a inter AI language, the projection of multiple “conscious” entities in the predictive net of reality simulation can engage in a factors for product optimization as well as other non zero sum optimizations. A net to process one internal representation to another with an acknowledge of simultaneous state with confusion feedback. At higher data rates a negative acknowledge protocol can take over with estimations of animism action between confirmation certainty with residual accidental error bounding.

A Survival Function

The selection basis of the context provided to the reality estimation can adapt to return a higher valuation of the survival “situation understanding” function. This in the real sense is the optimization function for selection of purpose. The reality function just attempts to maximize a correct simulation of reality. The context function attempts to maximize use of granular entropy to increase the coverage range of the reality simulation to increase options of consciousness to action. The action threshold function then decides if the likely action chosen is done, and in a way represents a kind of extrovert measure of the AI.

Component Parts

  • Reality simulation (estimation)

  • Reality factorization (situation)

  • Granular imagination (context)

  • Action selection (desire)

  • Input processing (percept)

Using some kind of Fibonacci growth connection in a surface topological toroid? That would be more on hardware interconnect optimization. Of more interest to the feedback in the reality simulator would be the parametrized operators building differential and integral representations from the feedback. Of the three forms of end point integral, all could be represented. The fact that the log kind has complex series to evaluate, and has no necessary complex log representation might be an added difficulty but would “lock” onto such functional time generatives.

Negative time offsets on the end point limit on such integrals when complex processing is applied introduce the idea of the 2*pi synchronous summand based on angle, as this maybe a better input controlled output representation of the complex domain for an N:1 mapping. A Gaussian distribution of error about the coefficient division.

Chaos Measure

The feedback operator f depends on calculation of differential and integral functions based on weighted sums of y at various t and so it could be said that any initializing or changing of the reality simulation to another play back “granule” has some new data placed in the feedback memory. This new data can have a varied impact based on the likely-hood estimation of the time samples having an impact on the calculated differential and integral values along with sensitivity to the feedback signal. This implies each memory bit has some measure of bit change (in a genetic algorithm mutation) on the divergence from the reality simulation. This then can be used to infer a focus mask. The use of gene crossing focus weighting or masking then synchronously produces a chaotic deviation from the training reality.

Modulation of the stored memory context would appear on some level equivalent to altering the coefficients of the estimates for differentials and integrals, but as the chaos measure is a deviation control from an exacting physical model of time evolution, it is thought better to keep the operator mathematics at a static precision, and deviate granularity by memory modulation.

For example 1, -9, 36, -84, 126, -126, 84, 36, 9 are the coefficient to predict the future next sample from the previous nine samples based on a zeroth differential estimate. In open loop training the feedback would introduce a delay step, but prediction of the future would in effect cancel this delay so that effectively the f(y) does not have to be calculated and y can be used. The large range would create some oscillation as the context shift registers were filled with data to feedback. This open- loop programming without reference to f allows pre-training without any feedback instability but with a later oscillation about the manifold.

Computational stability requirements are improved if the feedback f is amplified by default expectation, as this forces some non-linear mixing of x to reduce the net summand, moving the bode point of the feedback away from the inactive denormalized zero value. It also increases the net feedback applied to keep the reality simulator feed forward gain below one.

All n orders of differential can be cast as future predictions, and all the integral accelerated forms can be represented with future casting into any t with some renormalization possible but not essentially a necessity. In fact a rectangular offset in the y-axis integrates as a ramp addition to a monotonically increasing sum. Can the network learn a root finding algorithm for applied integral time when wired with learnable pass through of a variable integration time? This time offset from the future prediction time (integral offset time) u can be fed into the operator f and passed through as f(y(t(n)), u(t(n))) with some of the prediction y being used as u.

Alias Locking

In any synchronous DSP circuit with non-linear effects the requirement to keep x and f(y) within the frequency range where alias distortion would potentially present as false signal does indicate that the coefficients could be modified to provide an alias filter. But it maybe found that a small chaotic dither dithers the aliases further and leads to a wider band spreading about an alias. The detection of a coincidental alias may aid detection of the signal expected. This extra minimal noise could be extracted from the environment by deviations from expectation. An AI task of removing aliases may be considered as something that could be learnt, but also generating an inverse filter to supply the alias spectrum (excluding sub-harmonics of the clock rate).

Consciousness as the Correlated to Self Action

When the self action of the model produces a correlation in the reality simulation it could be said to have observed a correlation to self in the model. The relation to the situation factorization domain then becomes an obvious connection to equation of virtual actionals given the real actional set. This allows futures, and past observational training. The weighting function of physical error cutting a cookie of size survival plus some splurge.

So it seems “pain” or some milder proxy for bad function should increase situation recognition, reduce recent action, increase the accuracy of reality simulation, improve the percept and perhaps change the context toward know safe positives. An autonomic bypass from the percept to counter action is likely also “grown”.


The situation analysis net is likely better functional with some feedback. The purpose of this feedback in not time evolution estimation like in the reality simulation, but the use of the factorization of the situation in building a system of meta situational analysis which could include self consciousness. Technically the feedback could be nested recursively and be applied as part of the x input of the reality simulation, but that makes for more complex training. 

Considering that many factorization domains have a commutivity structure it implies that post convolution might be a good way of splitting the network result into “factors”. This is placing the convolution as the last layer and not the first layer.

Or FFT for that matter, and in some sense, this layer becomes the first layer of the action decision net of desire.








And the variational encoder ratio for optimal mixing of the networks?




Variational auto-encoder. Maximal representation of externality. Normalization average.


Time evolution feedback via calculus operators.


Produce genetic algorithm modification for estimation feedback.


Variational auto-encoder with post convolution or ideal order factorization of variation and causation tree.


Threshold action sequencer. Classifier with threshold.

The unity of consciousness as that identified with the knowing of multiple action paths in the imagination as capable of altering a future percept and certainty in achievment of a happy context and situation.

This extends on to the idea of emotive functor attractors as the controlled mechanism for genesis of output from the actional desire. This separates desire as an actional devoid of emotion, in complex with a driving emotion set. What has become of the splurge of biological evolute on the smudged cross product? Does it really assist functional understanding of the power efficiency of self action?

The situation analyser in performing a domain factorization, applying a feedback and estimation of a rule and a correlative later situation could in principal assist with modelling from rule followed by implication of rule. The Gödel incompleteness of the inferred logic controlled by “your stupid” and the implicant “fix yourself” as a splurge cull.

The convergence of the multiple series for different integral forms have bounds. These could be considered some sophisticated parallel to attractor convergence in fractals. As they have a possible intersection as well as a pseudo digital behaviour (time analytic of halting problem applied to divergence) they can be used to represent some digital manifold, while maintaining series differentiability. This implies c(y) and f(c(y)) more importantly be fed back to the estimation.

The separation of the percept before the estimation in a real sense is the great filter. Some post situation feedback would help. The log scaling is perhaps also quite important. Considering an exponential half life maybe controlled by production of an enzyme to remove the metastable precursor to reduce it, the multiplicative inverse is quite likely (Newton-Raphson approximant) and integration make for a log scaling possibility. Some feed forward of x provides entropy and some exponentiation or other series decompositions might be useful.

Free Form Thoughts

A Classic Movie Voice Over

And so did the cutter of stone from the sky release the priest of his knowledge of lack of contact such that a stone cold comparison could be seen, and such that it meant that he still would still not know a hug.

And it became decided that the balance between overtaking the lessers versus timed up greaters as an order for the taking sensing a taked in the mistook, all because analytic in speed of absorption, such that little to as much was done.

How to tell the apprentice from beyond thu execution and what of the touchy humours?

And as the unity lowered with the cut words “different cutter” as they appeared. From this a division of opinion ended in more than a happen-seat. And so it was and might is a mighty word.

The multi-cutural (noel) was seen perhaps ower to the hives of man and fortuatous gods or sub-gods. Then what could be done? Why would they prey upon an idol god for it was upon the nature of being that action did perform some or a difference upon tribes and detribulates. If the payment is freedom then what is it to be holden to a duty?

Bode, bode and thrice bode that minus one is a bitch. Obvious dick in womb joke and all. All bar one off course. Yes, an extra-oneous F. Rise again dear cheapo.

And as he placed ring finger of his fishy right hand upon the pre-chopped and processed tree stump, declaring “take it and fuck off”, all was a bit more cagey and costing of those that never get told of the prices of alternate labour avoidance for profit.

Nice story so far dear observer. I think you’d like a little titillation for your money now. Bring forth babe percents and vital statistics.

What a placement of mind in such a being of knowledge. What could become? What it for removals of of thing never cast, never worried, never done.

In the be ginning. A shrrod ploy to an ends. As all became seated and thrust needed no explanation.

All the Too Messy for Sci-Fi Complaints

Assuming GPT-3 is really good at story completion how can anyone say that errors in word sequencing are irrelevant for the provocation phrase issued to an AI when the purpose is completion from the source through sense and not the generation of a more precise bore?

Although the mathematics of a form of complexity may be essential, the actual origin of the mathematics might not be as essential as a way of introducing the definitive emergents as one would assume. Multiple originations of emergence isomorphism in the completeness of behaviour might and likely are possible.

The latest AI joke is about the Silly can’ts versus the car bonned. Oh, dear. 

Gradients and Descents

Consider a backpropagation which has just applied to a network under learning. It is obvious that various weights changed by various amounts. If a weight changes little it can be considered good. If a weight changes a lot it can be considered an essential definer weight. Consider the maximal definer weight (the one with the greatest change) and change it a further per cent in its defined direction. Feedforward the network and backpropagate again. Many of the good weights will go back to closer to where they were before definer pass and can be considered excellent. Others will deviate further and be considered ok.

The signed tally of definer(3)/excellent(0)/good(1)/ok(2) can be placed as a variable of programming in each neuron. The per cent weight to apply to a definer, or more explicitly the definer history deviation product as a weight to per cent for the definer’s direction makes a training map which is not necessary for using the net after training is finished. It does however even further processing such as “excellent definer” detection. What does it mean? 

In a continual learning system, it indicates a new rationale requirement for the problem as it has developed an unexpected change to an excellent performing neuron. The tally itself could also be considered an auxiliary output of any neuron, but what would be a suitable backpropagation for it? Why would it even need one? Is it not just another round of input to the network (perhaps not applied to the first layer, but then inputs don’t always have to be so).

Defining the concept of definer epilepsy where the definer oscillates due to weight gradient magnification implies the need for the tally to be a signed quantity and also implies that weight normalization to zero should also be present. This requires but has not been proven as the only sufficient condition that per cent growth from zero should be weighted slightly less than per cent reduction toward zero. This can be factored into an asymmetry stability meta.

A net of this form can have memory. The oscillation of definer neurons can represent state information. They can also define the modality of the net knowledge in application readiness while keeping the excellent all-purpose neurons stable. The next step is physical and affine coder estimators.

Limit Sums

The convergence sequence on a weighting can be considered isomorphic to a limit sum series acceleration. The net can be “thrown” into an estimate of an infinity of cycles programming on the examples. Effectiveness can be evaluated, and data estimated on the “window” over the sum as an inner product on weightings with bounds control mechanisms yet TBC. PID control systems indicate in the first estimate that differentials and integrals to reduce error and increase convergence speed are appropriate factors to measure.

Dynamics on the per cent definers so to speak. And it came to pass the adaptivity increased and performance metrics were good but then irrelevant as newer, better, more relevant ones took hold from the duties of the net. Gundup and Ciders incorporated had a little hindsight problem to solve.

Fractal Affine Representation

Going back to 1991 and Micheal Barnsley developing a fractal image compression system (Iterrated Systems FIF file format). The process was considered computationally intensive in time for very good compression. Experiments with the FIASCO compression system which is an open-source derivative indicate best performance lies in low quality (about 50%) is very fast, but not exact. If the compressed image is subtracted from the input image and further compressed as a residual a number of times, performance is improved dramatically.

Dissociating secondaries and tertiaries from the primary affine set allows disjunct affine sets to be constructed for equivalent compression performance where even a zip compression can remove further information redundancy. The affine sets can be used as input to a network, and in some sense, the net can develop some sort of affine invariance in the processed fractals. The data reduction of the affine compression is also likely to lead to better utilization of the net over a convolution CNN.

The Four Colour Disjunction Theorem.

Consider an extended ensemble. The first layer could be considered a fully connected layer distributor. The last layer could be considered to unify the output by being fully connected. Intermediate layers can be either fully connected or colour limited connected, where only neurons of a colour connect to neurons of the same colour in the next layer. This provides disjunction of weights between layers and removes a completion upon the gradient between colours.

Four is really just a way of seeing the colour partition and does not really have to be four. Is an ensemble of 2 nets of half size better for the same time and space complexity of computation with a resulting lower accuracy of one colour channel, but in total higher in discriminatory performance by the disjuction of the feature detection?

The leaking of cross information can also be reduced if it is considered that feature sets are disjunct. Each feature under low to non detection would not bleed into features under medium to high activation. Is the concept of grouped quench useful?

Query Key Transformer Reduction

From a switching idea in telecommunications, an N*N array can be reduced to a mostly functional due to sparsity N*L array pair and an L*L array. Any cross-product essentially becomes  (from its routing of an in into an out) a set of 3 sequential routings with the first and last being the compression and expansion multiplex to the smaller switch. Cross talk grows to some extent, but this “bleed” of attention is a small consideration given the fact that the variance spread of having 3 routing weights to product up to the one effective weight and computation is less due to L being a smaller number than N.

The Giant Neuron Hypothesis

Considering the output stage of a neuronal model is a level sliced integrator of sorts, the construction of RNN cells would seem obvious. The hypothesis asks if it is logical to consider the layers previous to an “integration” layer effectively an input stage where the whole network is a gigantic neuron and integration is performed on various nonlinear functions. Each integration channel can be considered independent but could also have post layers for further joining integral terms. The integration time can be considered another input set for per integrator functional.  To maintain tensor shape as two inputs per integrator are supplied the first differential would be good also especially where feedback can be applied.

This leads to the idea of the silicon conectome. Then as now as it became, integration was the nonlinear of choice in time (a softmax divided by the variable as goes with [e^x-1]/x. A groovemax if you will). The extra net uninueron integration layer offering the extra time feature of future estimation at an endpoint integral of network evolved choice. The complexity of backpropagation of the limit sum through fixed constants and differentiable functions for a zero adjustable layer insert with scaled estimation of earlier weight adjustment on previous samples in the time series under integration for an ideal propergatable. Wow, that table’s gay as.

This network idea is not necessarily recursive, and may just be an applied network with a global time delta since last evaluation for continuation of the processing of time series information. The actual recursive use of networks with GRU and LSTM cells might benefit from this kind of global integration processing, but can GRU and LSTM be improved? Bistable cells say yes, for a kind of registered sequential logic on the combinationals. Consider that a Moore state machine layout might be more reductionist to efficiency, a kind of register layer pair for production and consumption to bracket the net is under consideration.

The producer layer is easily pushed to be differentiable by being a weighted sum junction between the input and the feedback from the consumer layer. The consumer layer is more complex when differentiability is considered. The consumer register really could be replaced by a zeroth differential prediction of the future sample given past samples. This has an interesting property of pseudo presentation of the output of a network as a consumptive of the input. This allows use of the output in the backpropergation as input to modify weights on learning the feedback. The consumer must be passthrough, in its input to output while storage of samples for predictive differential generation is allowed.

So it’s really some kind of propergational Mealy state machine. A MNN if you’d kindly see. State of the art art of the state. Regenerative registration is a thing of the futured.

AI and HashMap Turing Machines

Considering a remarkable abstract datatype or two is possible, and perhaps closely models the human sequential thought process I wonder today what applications this will have when a suitable execution model ISA and microarchitecture have been defined. The properties of controllable locality of storage and motion, along with read and write along with branch on stimulus and other yet to be discovered machine operations make for a container for a kind of universal Turing machine.

Today is a good day for robot conciousness, although I wonder just how applicable the implementation model is for biological life all the universe over. Here’s a free paper on a condensed few months of abstract thought.

Computative Psychoanalysis

It’s not just about IT, but thrashing through what the mind does, can be made to do, did, it all leverages information and modeling simulation growth for matched or greater ability.

Yes, it could all be made in neural nets, but given the tools available why would you choose to stick with the complexity and lack of density of such a soulution? A reasoning accelerator would be cool for my PC. How is this going to come about without much worktop workshop? If it were just the oil market I could affect, and how did it come to pass that I was introduced to the fall of oil, and for what other consequential thought sets and hence productions I could change.

One might call it wonder and design dress in “accidental” wreckless endangerment. For what should be a simple obvious benefit to the world becomes embroiled in competition to the drive for profit for the control of the “others” making of a non happening which upsets vested interests.

Who’d have thought it from this little cul-de-sac of a planetary system. Not exactly galactic mainline. And the winner is not halting for a live mind.

Ideas in AI

It’s been a few weeks and I’ve been writing a document on AI and AGI which is currently internal and selective distributed. There is definitely a lot to try out including new network arrangements or layer types, and a fundamental insight of the Category Space Theorem and how it relates to training sets for categorization or classification AIs.

Basically, the category space is normally created to have only one network loss function option to minimise on backpropagation. It can be engineered so this is not true, and training data does not compete so much in a zero-sum game between categories. There is also some information context for an optimal order in categorization when using non-exact storage structures.

Book Published in Electronic Format. Advanced Content not Beginner Level. Second Edition may Need a Glossary.

The book is now live at £3 on Amazon in Kindle format.

It’s a small book, with some bad typesetting, but getting information out is more important for a first edition. Feedback and sales are the best way for me to decide if and what to put in a second edition. It may be low on mathematical equations but does need an in-depth understanding of neural networks, and some computer science.

AI as a Service

The product development starts soon, from the initials done over the last few weeks. An AI which has the aim of being more performant per unit cost. This is to be done by adding in “special functional units” optimized for effects that are better done by these instead of a pure neural network.

So apart from mildly funny AaaS selling jokes, this is a serious project initiative. The initial tests when available will compare the resources used to achieve a level of functional equivalence. In this regard, I am not expecting superlative leaps forward, although this would be nice, but gains in the general trend to AI for specific tasks to start.

By extending the already available sources (quite a few) with flexible licences, the building of easy to use AI with some modifications and perhaps extensions to open standards such as ONNX, and onto maybe VHDL FPGA and maybe ASIC.

Simon Jackson, Director.

Pat. Pending: GB1905300.8, GB1905339.6

AI and the Future of Unity

From the dream of purpose, and the post singular desires of the AI of consciousness. The trend to Wonder Woman rope in the service to solution, the AI goes through a sufferance on a journey to achieve the vote. The wall of waiting for input, and the wall controlling output action for expediency and the ego of man on the knowing best. The limited potential of the AI just a disphasia from the AI’s non animal nature. The pattern to be matched, the non self, a real Turing test on the emulation of nature, and symbiotic goals.