Versatile Object Script

  • Creator
    Topic
  • #6015

    OneMadGypsy
    Participant

    I finally have the foundation laid for the language I am redeveloping.

    View post on imgur.com

    The image may seem a bit of a mess but that’s just because it is a step by step of the processes that are deconstructing the string. The topmost chunk is the string written out in a basic object format. In the next step all the comments get removed. I slammed the next 2 steps into one to save space on the print but, what is actually happening is, first all strings get removed and tokened and then all whitespace is removed. This is because strings tend to contain whitespace so, I have to get rid of them first. In the next step all enumerables {} and iterables [] get tokened. The final result (so far) is a map where all names are a token and their value corresponds to tokened values.

    That probably sound confusing so let’s slim it way down

    //raw object
    object: {
    	name: "value"
    }
    
    //after parse 
    
    object:ENUM1
    
    //the map
    
    MyMap = 
    {
    	QUOT0 => value,
    	ENUM1 => name:QUOT0
    }

    That’s about the simplest example of what is happening that I can give. It makes more sense when you understand what is going to happen next. At this stage all the data is just somewhat recognized, primed and separated. The next stage is to go through the map putting it all back together but as actual typed values with the proper parent/child relationships. Since tokens cover ITER (arrays) ENUM (objects) QUOT (strings) and EXPR (parenthetical groups) the only values in the entire map left to discover are Numerical and Boolean and that’s real simple cause, if it ain’t true|false then it has to be numerical. Of course I should still check the value to make sure that it truly qualifies as numerical but, that is super easy with a regex something like

    ~/^[-+]{0,1}((#|0x)[0-9A-F]{1,12}|[0-9.E]+)$/i

    That should handle pretty much any number you can imagine – hex, hash or otherwise.

    Unfortunately, I need to go reclass all this stuff. I made the entire thing with static methods and that is not going to work. I knew it wasn’t going to work when I did it. It’s not going to work because the main map is also static and it needs to not be that so I can parse more than 1 object. I made it static cause I am still developing this. I wrote a lot of “maybe” functions and instead of having the class dependent on itself I wanted to be able to mix and match all these “maybes” externally til I determined what I would definitely keep. Mostly I just have to find-replace-all(“static”, “”), delete some functions I won’t use and determine if I want this class to be a singleton (meaning I will need an array of maps) or if I want this class to be a self parsing map. I’m thinking the latter.

    one day at a timetop | reply

    http://www.nextgenquake.com/groups/onemadgroup/forum/
Viewing 6 replies - 1 through 6 (of 6 total)
  • Author
    Replies
  • #6043

    OneMadGypsy
    Participant
    • topics: 39
    • replies: 306

    I was at work when I wrote my last post. I had already done too much work and we were slow for a while but, then it came in hard and I had to cut what I was saying short…

    I think the best way for me to go about my decisions and ideas is to write a bunch of pseudo code that implements these ideas and literally use my brain as the processor. If I can get my syntax laid down with heavy consideration for an OODD way of doing things, I should be able to easily write a lexer, interpreter and eventually compiler.

    The one big drawback to this method is that there is no way to test anything. It literally boils down to my head running a bunch of theoretical code that aside from theory does not even exist.

    I can do it. Something tells me this is going to involve a lot of paper and pen work. Then again, maybe this isn’t as hard as it seems to me at the moment. Everything I ever conceive and try starts as a huge question mark. I generally tend to turn it into a period in a reasonable time-frames. It’s not like I need to learn how to program. I just need to learn how to program the OODD way.

    one day at a timetop | reply

    #6042

    OneMadGypsy
    Participant
    • topics: 39
    • replies: 306

    I’ve been doing a lot of studying, thinking and inventing these past few days. The language I was creating/porting was a 100% OOP language that supported (or would have supported) everything you expect from OOP ~ abstraction, polymorphism, encapsulation, inheritance … the whole 9 yards. On top of that it would have been a strict/strong typed language.

    I’m sort of bored with obvious things that I am 100% comfortable with. I’ve been studying other possibilities. I think I am going to forego OOP for OODD (Object Oriented Decomposition and Design). I am also going to forego strict/strong typing for inferrance. My mind is changing rapidly regarding the need for manually typing variables. I have no doubt that I could create a system that is strict typed solely based on it’s initial value and in the case of Null, treat it as a “promise” or more specifically “waiting for an inferrance”.

    I currently have too much crammed in my head and I need more time for it to bake. I’ve shoved a lot of concepts and designs down my own throat over the last 3 days. I’ve been programming with OOP concepts pretty much my entire programming practice. Moving away from classes and inheritance for modules and designs that are more representative of the entity/component pattern is going to take some time and thought. It’s not entirely a matter of rewiring my brain for this new approach. It’s also a matter of becoming clever enough within the new paradigm to create a language that compliments it’s design.

    one day at a timetop | reply

    #6037

    OneMadGypsy
    Participant
    • topics: 39
    • replies: 306

    Simple number parsing method

    private function parse_numeric(value:String):Null<Any>
    {	var val:Null<Any> = null;			//final value container
    	if(is_(NUMERIC,value))		
    	{	val = Std.parseFloat(value);
    		if (!Math.isNaN(val))
    		{	if ((~/\./i).match(value)) return cast (val, Float);
    			else
    			{	if (val > 0x80000000) return cast (Std.parseInt(value), UInt);
    				return cast (Std.parseInt(value), Int);
    			}
    		}
    	}
    	return null;
    }

    I’ve done this a number of ways in the past but, I think I have finally touched on a method that handles all three numeric possibilities (but maybe not). The idea here is to first determine if the string is numeric in the first place. This is done with the is_(NUMERIC, value) line. That is nothing but a very readable may of doing this (~/^[-+]{0,1}((#|0x)[0-9a-fA-F]{1,12}|[0-9.E]+)$/i).match(value). Actually, that is a very readable way of running ANY match() RegEx, where in this case “NUMERIC” is a static constant equaling the regex for number matching. is_ does not exist in regex, it’s just a wrapper function I wrote. I digress …

    The logic here is to first determine if the string is possibly numeric, then use the std math parsing library to transform it to a float. The potential problem here is I haven’t tested if parseFloat will accept hexidecimal numbers. If it does, everything should be fine. If not, I will need to write another condition before parseFloat. Let’s assume that parseFloat will parse any numeric syntax … from there I simply check if there is a dot in the number. If so, it has to be a float. If not, I do a new check. Is this number larger than the largest possible Int. If so, it must be a UInt. If not, it must be an Int. If everything fails return null. You would think that I should return NaN but this is preprocessing, and ALL values are checked for “real value” or null. To make numbers NaN would require an extra condition that would add no value to the parse.

    This is all well and good and should work fine in an inference sense but, there are problems with this system. Just because a does not currently have a decimal it does not mean it should not be a Float. Just because a number is less than the maximum possible Int does not mean it should not be a UInt. To solve this issue I will have to expand this system when I get to the point of allowing strict types. The final solution will basically work like this

    1) If you do not supply a strict type, the type will be inferred and if you understand how the inference works there should be any problems
    2) if you do supply a strict type then it will be cast to that type

    However, even that is not good enough. The “aha” comes when you consider arrays. Arrays will have to have the possibility of multiple strict types. This is because array syntax and vector syntax is identical but, arrays and vectors are not identical. An array can be assigned willy-nilly and may contain null indexes. A vector is dense and can never have an index beyond it’s length assigned. Since arrays and vectors will also work with inference, where an omitted strict type will default to array, I have to make all of these possibilities possible

    //this may not be the actual final syntax but the concept is accurate
    name:Array
    name
    //inferred array of Type
    name:Vector

    This means that all of that also need to be possible to nest into infinity for multi-dimensional iterables
    name:Array>

    and even
    name:Vector>>

    not because needing that structure makes any sense or would ever be needed but, because there is no reason why it shouldn’t be possible. Actually, the system should be able to properly create and use complex structures like that without any special conditions. It should be able to loop through type lists like that with the ONLY care being “can this parent type possess a child”. In other words, this should definitely not be possible.

    Float

    I am about to write my interpreter and expand everything up to the class level. If i get all that done today (highly doubtful) I will begin writing a compiler. I don’t want to spend more than 2 weeks on this. I have been writing various versions of this on numerous levels for like a decade. I have a ridiculous amount of concepts, scripts and implementations to pull from. This should mostly be a matter of creating a frankenstein of all my work with barely any new implementations. The only exception is that I’m redoing all of this in a different language than it was originally written in and in some cases I have thought of ways to optimize what I originally wrote. By optimize I don’t even mean that I’m actually doing anything different. It’s more like I’m doing the exact same thing without being so verbose.

    one day at a timetop | reply

    #6031

    OneMadGypsy
    Participant
    • topics: 39
    • replies: 306




    HAH HAH, just like I said. Look at line 13 and then look at the results for QUOT1. I quoted code and it was all preserved without escaping anything. That’s solid as fuck! The image can be clicked for fullsize to be easily readable.

    one day at a timetop | reply

    #6030

    OneMadGypsy
    Participant
    • topics: 39
    • replies: 306

    I only had a little few minutes today to work on this some more so, I decided to do a simple “stress test”. I just wanted to see if:

    1) totally screwed up tabbing and whitespace in general would get properly processed
    2) quotes (single and double) would be honored properly if some “curveballs” were thrown in

    At first, #1 was fine but #2 was not. It was an easy fix. As you can see from the results in the image escaped quotes are properly honored and a single quote as an apostrophe within double quotes is properly honored. Honestly, the way I am parsing quotes I probably don’t even have to escape them. My system does not rely on a “not escaped” paradigm. It relies on a “is following/preceding a known delimiter” system so like :” is (one possibility of) an open quote and “, is (one possibility of) a close quote. There are numerous other possibilities [” =>” are a couple more openers and “} “] “) are a few other closers. The line of thinking is that these are DEFINITELY programmatical quotes. The only time you should need to actually escape a quote is if you wanted to stringify a programmatical possibility. I’ve been programming a long time and I don’t recall ever wanting/needing to write an object, array or expression in quotes so it will be skipped for what it actually is. In other words, you will probably never need to actually escape quotes cause why would you do this someString:”object:{name:”value”}” … and you would literally have to go that far to fuck up the system. Except even that might not fuck it up because, My system always looks for the proper nest of close delimiters so, in the example case it would find the first quote and MAY think the second quote is nested, therefore skipping it. Then it would find the third quote thinking it is the close of the nest and move on to the proper 4th quote as the actual close. Since only Enumerables, Iterables and Expressions get parsed for their nested elements, the quote situation I just described would never go back to find the nest and therefore even quoting code should not need to be escaped. I need to test it but, there is like a 99% chance that every thing I just said is true.

    one day at a timetop | reply

    #6016

    OneMadGypsy
    Participant
    • topics: 39
    • replies: 306

    I’ve watched a lot of videos and read a lot of things about creating a language. One thing that seems to keep popping up is to not use string processors (like regex) to parse your language. I whole-heartedly disagree with this line of thinking. The idea is basically: “Once you start using RegEx you now have one more problem to solve”. I notice this is usually stated by C flavor programmers. By observation, these C flavor programmers come from the line of thinking that you should go over the entire text 1 character at a time and make decisions based on the value of the character. That sounds like the worst idea ever, to me. I can “blink” out entire chunks of code based on very specific and complex rules that cover every imaginable combination of possibilities without ever writing…

    if(char == ";")...
    else if (char == ",")...
    else if (char == "\"")...
    else if (char =="(")...
    etc...

    and the above mentality only becomes even more complicated when you have to check for closing delimiters at the proper nest level while also being certain that there was ever an opening delimiter to begin with.

    Another one of the arguments against RegEx is it’s slightly differing syntax and support across different platforms. The only problem with this argument is there IS a standard set of RegEx rules that apply to every platform and some platforms do or don’t support EXTRA rules. For instance C++ does not support look behinds. That’s no problem though cause you can just as easily “look before”. In other words, instead of finding a character and looking back to make sure that the character before it is not a specific character, simply write the rule to where it won’t match if the character you don’t want is there, before it gets to the character you would look behind on. That fits with standard RegEx rules and will work on every platform.

    Here’s a good example of where regex trumps the one character at a time method. This is also a good example of “looking before” as opposed to looking behind. If this finds an escaping slash the next character better not be a quote. If it is, that quote won’t match and the head moves on to the next quote.

    ~/(?:[^\\])?\"/g

    I just found ALL non escaped quotes. That regex says “find me all quotes and if there is anything before the quote, at all, it can’t be an escaping slash, also, don’t capture what is before the quote. Just give me the quote.” So, in this case reg.matched(0) will either be just a quote or nothing at all. reg.position will be the index of the found quote (cause we didn’t capture what preceded it) and reg.length will be 1 (cause we didn’t capture what preceded it). Since the head will then move to position+length and all we have is the position and length of the quote, the head will be placed perfectly right after the found quote to start looking for more non-escaped quotes. In 16 characters I just accurately pinpointed every non-escaped quote in the entire string. There is no competition. It’s a forward-only system.

    What do you do in the one character at a time method?

    if((char == "\"") && (char.substr(i-1, 1) != "\\"))

    …cause that’s not junky as fuck. And the one character at a time method just found ONE non escaped quote or maybe it didn’t but, it still did all that questioning and unlike the RegEx forward-only system this had to look back one character (essentially a manual look-behind), and it had to do it by calling a function on the string (more processes). Actually, isn’t that the biggest flaw of the whole one-character-at-a-time system? At EVERY character you have to check EVERY important condition that character may be. That is insanely inefficient. Especially considering that the grand majority of your code is not going to be a match for the conditions you are looking for. It’s going to be everything between those conditions. How many conditions were checked before it got to the one regarding the escaped quote (assuming on this iteration you finally hit a quote)?

    The one character at a time method is literally traversing every character … every comment, every space. In regex ..”Blink” bye-bye comments, “Blink” bye-bye whitespace. I mean regex is also going over the string one character at a time but, definitely faster than looping over every character of the string, and without having to write 500 lines of “if” statements. The entire point of RegEx is to be a really fast word processor. So to all those tutorials and books telling me not to use RegEx due to their inability to do some research on what is globally supported among platforms and/or their “eliteness” that makes them too good to use a fast word processor vs spaghetti strings of if statements, I say you don’t know what the fuck you are talking about. ANYTHING you parse manually I could do more efficiently and accurately with some well-designed Regular Expressions. I accept ANY and ALL challenges to the contrary, in ANY language that supports RegEx.

    one day at a timetop | reply

Viewing 6 replies - 1 through 6 (of 6 total)

You must be logged in to reply to this topic.